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Abstract 

We present a generalization of the method of the local relaxation flow to establish the uni- 
versality of local spectral statistics of a broad class of large random matrices. We show that 
the local distribution of the eigenvalues coincides with the local statistics of the corresponding 
Gaussian ensemble provided the distribution of the individual matrix element is smooth and 
the eigenvalues {xj}jL t are close to their classical location {7,}^ determined by the limiting- 
density of eigenvalues. Under the scaling where the typical distance between neighboring eigen- 
values is of order 1/7V, the necessary apriori estimate on the location of eigenvalues requires 
only to know that E|.Tj — 7^ 2 < iV _1 ~ 6 on average. This information can be obtained by well 
established methods for various matrix ensembles. We demonstrate the method by proving local 
spectral universality for sample covariance matrices. 
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Nous prescntons une generalisation dc la mcthode du flot de relaxation locale servant a etablir 
l'universalite des statistiques spectrales locales d'une vaste classe de grandes matrices aleatoires. 
Nous demontrons que la distribution locale des valeurs propres coincide avec celle de l'ensemble 
gaussien pourvu que la loi des coefficients individuels de la matrice soit lisse et que les valeurs 
propres {xj}j =1 soient pres de leurs quantiles classiques {7j}jLi determinees par la densite 
limite des valeurs propres. Dans la normalisation ou la distance typique entre les valeurs propres 
voisines est d'ordre 1/JV, la borne a priori necessaire sur la position des valeurs propres necessite 
uniquemcnt l'ctablisscment de E|a;j — 7j| 2 < iV~ 1_e en moycnne. Cette information peut etre 
obtenue par des methodes bien etablies pour divers ensembles de matrices. Nous illustrons la 
mcthode en demontrant l'universalite spectrale locale pour des matrices de covariance. 

AMS Subject Classification (2010): 15B52, 82B44 
Running title: Local relaxation flow 

Keywords: Random matrix, sample covariance matrix, Wishart matrix, Wigner-Dyson statistics 

1 Introduction 

A central question concerning random matrices is the universality conjecture which states that local 
statistics of eigenvalues of large N x N square matrices H are determined by the symmetry type of 
the ensembles but are otherwise independent of the details of the distributions. In particular they 
coincide with that of the corresponding Gaussian ensemble. The most commonly studied ensembles 
are 

(i) hermitian, symmetric and quaternion self-dual matrices with identically distributed and 
centered entries that are independent (subject to the natural restriction of the symmetry); 

(ii) sample covariance matrices of the form H = A* A, where A is an M x N matrix with 
centered real or complex i.i.d. entries. 

There are two types of universalities: the edge universality and the bulk universality concerning 
energy levels near the spectral edges and in the interior of the spectrum, respectively. Since the 
works of Sinai and Soshnikov |35[ [37] , the edge universality is commonly approached via the fairly 
robust moment method [33\ \22\ [36j [38], [31] ; very recently an alternative approach was given in [ID] . 

The bulk universality is a subtler problem. In the simplest case of the hermitian Wigner 
ensemble, it states that, independent of the distribution of the entries, the local fc-point correlation 
functions of the eigenvalues (see (|2.3|) for the precise definition later), after appropriate rescaling 
and in the N — > oo limit, are given by the determinant of the sine kernel 

det(K(x e - Xj )) k £ . v K(x) = S -^. (1.1) 

'J TTX 

Similar statement is expected to hold for all other ensembles mentioned above but the explicit 
formulas are somewhat more complicated. Detailed formulas for the different Wigner ensembles 
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can be found e.g., in [30J. The various sample covariance ensembles have the same local statistics 
for their singular values as the local eigenvalue statistics of the corresponding Wigner ensembles. 

For ensembles of hermitian, symmetric or quaternion self-dual matrices that remain invariant 
under the transformations H — > U*HU for any unitary, orthogonal or symplectic matrix U, respec- 
tively, the joint probability density function of all the N eigenvalues can be explicitly computed. 
These ensembles are typically given by the probability density 

P(H)dH ~ exp(-iVTr V(H))dH, (1.2) 

where V is a real function with sufficient growth at infinity and dH is the flat Lebesgue measure on 
the corresponding symmetry class of matrices. The eigenvalues are strongly correlated and they are 
distributed according to a Gibbs measure with a long range logarithmic interaction potential. The 
joint probability density of the eigenvalues of H with distribution (jl.2p can be computed explicitly: 

N 

f(x u x 2 , ...,x N ) = (const.) J] \ Xj - Xi f J] e -"£f=i v <&\ (1.3) 

i<j j=l 

where /3 = 1,2,4 for hermitian, symmetric and symplectic ensembles, respectively, and const, is 
a normalization factor. The formula (|1.3|) defines a joint probability density of N real random 
variables for any /3 > 1 even when there is no underlying matrix ensemble. This ensemble is 
called the invariant f3-ensemble. Quadratic V corresponds to the Gaussian ensembles; we note 
that these are the only ensembles that are simultaneously invariant and have i.i.d. matrix entries. 
These are called the Gaussian Orthogonal, Unitary and Symplectic Ensembles (GOE, GUE, GSE 
for short) in case of (3 = 1,2,4, respectively. Somewhat different choices of V lead to two other 
classical ensembles, the Laguerre and the Jacobi ensembles, that also have matrix interpretation 
for f3 = 1, 2, 4 (e.g., the Laguerre ensemble corresponds to the Gaussian sample covariance matrices 
which are also called Wishart matrices), see [TTJ [23] for more details. The local statistics can be 
obtained via a detailed analysis of orthogonal polynomials on the real line with respect to the weight 
function exp(— V(x)). This approach was originally applied to classical ensembles by Dyson |13j . 
Mehta and Gaudin [31] and Mehta [30] that lead to classical orthogonal polynomials. Later general 
methods using orthogonal polynomials were developed to tackle a very general class of invariant 
ensembles by Deift et.al, see [3 EJ El [10] and references therein, and also by Bleher and Its [5j and 
Pastur and Schcherbina |32j . 

Many natural matrix ensembles are typically not unitarily invariant; the most prominent exam- 
ples are the Wigner matrices or the sample covariance matrices mentioned in (i) and (ii). For these 
ensembles, apart from the identically distributed Gaussian case, no explicit formula is available for 
the joint eigenvalue distribution. Thus the basic algebraic connection between eigenvalue ensembles 
and orthogonal polynomials is missing and completely new methods needed to be developed. 

The bulk universality for hermitian Wigner ensembles has been established recently in |14j . by 
Tao and Vu in [39] and in [15]. These works rely on the Wigner matrices with Gaussian divisible 
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distribution, i.e., ensembles of the form 

H + y/8V, (1.4) 

where H is a Wigner matrix, V is an independent standard GUE matrix and s is a positive constant. 
Johansson [26] (see also Ben Arous and Peche [3] and the recent paper [27]) proved the bulk 
universality for the eigenvalues of such matrices by an asymptotic analysis on an explicit formula 
for the correlation functions adapted from Brezin-Hikami [6] . Unfortunately, the similar formula for 
symmetric or quaternion self-dual Wigner matrices, as well as for real sample covariance matrices, is 
not very explicit and the technique of [31 114^ 126] cannot be extended to prove universality. Complex 
sample covariance matrices can however be handled with an analogous formula [3] and universality 
without any Gaussian component is a work in progress [I]. 

A key observation of Dyson is that if the matrix H + y/sV is embedded into a stochastic matrix 
flow, i.e. one considers H + V(s) where the matrix elements of V(s) are independent standard 
Brownian motions with variance s/N, then the evolution of the eigenvalues is given by a system 
of coupled stochastic differential equations (SDE), commonly called the Dyson Brownian motion 
(DBM) [12j . If we replace the Brownian motions by the Ornstein-Uhlenbeck processes to keep 
the variance constant, then the resulting dynamics on the eigenvalues, which we still call DBM, 
has the GUE eigenvalue distribution as the invariant measure. Similar stochastic processes can be 
constructed for symmetric, quaternion self-dual and sample covariance type matrices, and, in fact, 
on the level of eigenvalue SDE they can be extended to other values of (3 (see (|5.5p and (|5.8p for 
the precise formulas). 

The result of [26^ [3] can be interpreted as stating that the local statistics of GUE is reached 
via DBM for time of order one. In fact, by analyzing the dynamics of DBM with ideas from 
the hydrodynamical limit, we have extended Johansson's result to s > _/V -3 / 4 [16]. The key 
observation of [16] is that the local statistics of eigenvalues depend exclusively on the approach to 
local equilibrium which in general is faster than reaching the global equilibrium. Unfortunately, the 
identification of local equilibria in [16] still uses explicit representations of correlation functions by 
orthogonal polynomials (following e.g. [32]), and the extension to other ensembles is not a simple 
task. 

In [20] we introduced an approach based on a new stochastic flow, the local relaxation flow, 
which locally behaves like DBM, but has a faster decay to equilibrium. This method completely 
circumvented explicit formulas and it resulted in proving universality for symmetric Wigner matrices 
(the method applies to hermitian and quaternion self-dual Wigner matrices as well). As an input 
of this method, we needed a fairly detailed control on the local density of eigenvalues that could 
be obtained from our previous works on Wigner matrices \17\ fl"8j H5] . 

In this paper we will prove a general theorem which states that as long as the eigenvalues are at 
most _/V _1 / 2_e distance near their classical location on average, the local statistics is universal and 
in particular it coincides with the Gaussian case for which explicit formulas have been computed. 
To introduce this flow, denote by jj the location of the j-th eigenvalue that will be defined in 
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(|2.12p . We first define the pseudo equilibrium measure by 



A? 



lun — CV exp 

(-NW)fi N , W(x)=^2w j (x j ), Wj{x) 

3=1 



2R 2 



(xj - 7j) S 



(1.5) 



where /j,n is the probability measure for the eigenvalue distribution of the corresponding Gaussian 
ensemble. In case of Wigner matrices, /zjv is the measure for the general (3 ensemble (J3 > 1 and 
(3 = 2 for GUE): 



fi = /xjv(dx) 



-«(x) 



-dx, 



N 



N 



N 



i<j 



(1.6) 



In this setting, it is natural to view eigenvalues as random points and their equilibrium measure as 
Gibbs measure with a Hamiltonian T~L. We will freely use the terminology of statistical mechanics. 
Note that the additional term Wj in a; at confines the j-th point Xj near its classical location, but the 
probability w.r.t. the equilibrium measure of the event that Xj near its classical location will be 
shown to be very close to 1. Furthermore, we will prove that the local statistics of the measures ojjv 
and fiN are identical in the limit N — > oo and this justifies the term pseudo equilibrium measure. 

The local relaxation flow is defined to be the reversible flow (or the gradient flow) generated 
by the pseudo-equilibrium measure. The main advantage of the local relaxation flow is that it 
has a faster decay to global equilibrium (Theorem I4.2|) compared with the DBM. The idea behind 
this construction can be related to the treatment of metastability in statistical physics. Imagine 
that we have a double well potential and we wish to treat the dynamics of a particle in one of 
the two wells. Up to a certain time, say to, the particle will be confined in the well where the 
particle initially located. However, the potential of this particle, given by the double well, is not 
convex. A naive idea is to regain the convexity before the time to is to modify the potential to be 
a single well! Now as long as we can prove that the particle was confined in the initial well up to 
to, there is no difference between these two dynamics. But the modified dynamics, being w.r.t. a 
convex potential, can be estimated much more precisely and this estimate can be carried over to 
the original dynamics up to the time to- 

In our case, the convexity of the equilibrium measure jjln is rather weak and in fact, it comes 
from the quadratic confining potential flxf/A of fjl .6|) . So the potential is convex, just not "convex 
enough". There is no sharp transition like jumping from one metastable state to another as in the 
double well case. Instead, there are two time scales: in short time the local equilibrium is formed, 
on longer time, it approaches the global equilibrium. The approach to the local equilibrium is 
governed by a strong intrinsic convexity in certain directions due to the interactions (see (|2.10p 
later for a precise formula). To reveal this additional convexity, in our previous paper [20] we 
introduced a pseudo equilibrium measure where we replaced the long range part of the interaction 
by a mean-field potential term using the classical locations of far away particles. This potential 
term inherited the intrinsic convexity of the interaction and it could be directly used to enhance 
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the decay to the local statistics. One technical difficulty with this approach was that we needed to 
handle the singular behavior of the logarithmic interaction potential. In this paper we show that the 
pseudo equilibrium measure can be defined by adding a Gaussian term. This simple modification 
turns out to be sufficient and is also model- independent. Since the Gaussian modification is regular, 
we no longer need to deal with singularities. The price to pay is that we need a slightly stronger 
local semicircle law which will be treated in Section 

The method of local relaxation flow itself proves universality for Wigner matrices with a small 
Gaussian component y/sV (typically of variance s > iV~ 7 with some < 7 < 1). In other words, 
we can prove universality for a Wigner ensemble whose single entry distribution (the distribution 
of its matrix elements) is given by e tB UQ, where B is the generator of the Ornstein-Uhlenbeck 
process and uq is any initial distribution (We remark that in our approach of decay to equilibrium, 
the Brownian motion in the construction of DBM is always replaced by the Ornstein-Uhlenbeck 
process). To obtain universality for Wigner matrices without any Gaussian component, it remains 
to prove that for a given Wigner matrix ensemble with a single entry distribution u we can find 
uq and t such that the eigenvalue distributions of the ensembles given by u and e tB uo are very 
close to each other. By the method of reverse heat flow introduced in p3], we choose uq to be an 
approximation of e~ tB v. Although the Ornstein-Uhlenbeck evolution cannot be reversed, we can 
approximately reverse it provided that v is sufficient smooth and the time is short. This enables 
us to compare local statistics of Wigner ensembles with and without small Gaussian components 
assuming that the single entry distribution is sufficiently smooth (see Section [6]). 

As an application, we will use this method to prove the bulk universality of sample covariance 
ensembles. The necessary apriori control on the location of eigenvalues will be obtained by a local 
semicircle law. In addition to sample covariance ensembles, we will outline the modifications needed 
for proving the bulk universality of symplectic ensembles. 

2 Universality for the local relaxation flow 

In this section, we consider the following general setup. Suppose = e~ N ^/Z is a probability 
measure on the configuration space W N characterized by some Hamiltonian T~L : M. N — > M, where 
Z = J e~ NH ^dx < 00 is the normalization. We will always assume that T~L is symmetric under 
permutation of the variables x = (x\,X2, • • • , xn) G M N . 

We consider time dependent permutational symmetric probability measures with density /t(x), 
t > 0, with respect to the measure /i(dx) = /i(x)dx. The dynamics is characterized by the forward 
equation 

dtft = Lf t , t > 0, (2.1) 

with a given permutation symmetric initial data /o- Here the generator L is defined via the Dirichlet 
form as 

N 

D(f) = £>„(/) = - j /L/d/i = E ^ / (^/) 2 d^, d 3 = d Xj . (2.2) 
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Formally, we have L = 277 A — ^(VH)V. In Appendix [A] we will show that under general conditions 
on "H the generator can be defined as a self-adjoint operator on an appropriate domain and the 
dynamics is well defined for any /o £ L l (d[i) initial data. Strictly speaking, we will consider a 
sequence of Hamiltonians Hn and corresponding dynamics and ft t N parametrized by N, but 
the iV-dependence will be omitted. All results will concern the TV — > 00 limit. 

The expectation with respect to the density ft will be denoted by E^ with E := Eo- The 
expectation with respect to the equilibrium measure \i is denoted by E^ 1 . For any n > 1 we define 
the n-point correlation functions (marginals) of the probability measure /jd/i by 

rf&Cx..*.-.*.)-/ /.(xWx)dx„ +1 ...d IN . (2.3) 

With a slight abuse of notations, we will sometimes also use \i to denote the density of the measure 
[i with respect to the Lebesgue measure. The correlation functions of the equilibrium measure are 
denoted by 

pL'M^I)^, • • • , x n) = / /u(x)dx n+ i . . . dx N . 

We now list our main assumptions on the initial distribution /q and on its evolution ft- We 
first define the subdomain 

T, N := {x e R N , x x < x 2 < ... < x N } (2.4) 

of ordered sets of points x. In the application to the sample covariance matrices, we will use the 
subdomain 

£+ := {x G R N , < x x < x 2 < ... < x N } (2.5) 
of ordered sets of positive points. 

Assumption I. The Hamiltonian "H of the equilibrium measure has the form 

N 

% = H N (x) = py^Uixj) - - ^ log \xi - Xj\\ , (2.6) 

j=l i<j 

where f3 > 1. The function U : R — > R is smooth with U" > and 

U(x) > C\xf for some 5 > and \x\ large. (2-7) 

The condition U" > can be relaxed to mfU" > —00, see remark after (|4.1ip . 

Alternatively, in order to discuss the case of the sample covariance matrices, we will also consider 
the following modification of Assumption I. 
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Assumption I'. The Hamiltonian "H of the equilibrium measure has the form 

N 

U = H N (x) = py^Ufa) - — ^log \xi - xj\ - — ^loglx, + xj\ - ^^log|xj| , (2.8) 

j=l i<j i<j j 

where /3 > 1 and c/v > 1- The function U satisfies the same conditions as in Assumption I. 

It is easy to check that the condition (|2.7p guarantees that the following bound holds for the 
normalization constant 

| log Z\ < CN m (2.9) 

with some exponent m depending on 5. 

In Appendix [A] we will show that for /3 > 1 the dynamics (|2.ip can be restricted to the sub- 
domains Sjv or Ei, respectively, i.e, the ordering will be preserved under the dynamics. In the 
sequel we will thus assume that ft is a probability measure on or Yl't. We continue to use the 
notation / and fi for the restricted measure. Note that the correlation functions from (|2.3|) are 
still defined on R k , i.e., their arguments remain unordered. 

It follows from Assumption I (or I') that the Hessian matrix of % satisfies the following bound: 
(v, V 2 H(x)v) > A Z ^2 . v = (v 1 ,...,v N )ER N , xGSjv (orxGS+). (2.10) 

iv (^Xj Xjj 

This convexity bound is the key assumption; our method works for a broad class of general 
Hamiltonians as long as (|2.10p holds. In particular, an arbitrary many-body potential function 
V(x) can be added to the Hamiltonians (I2.6j) . (12. 8p . as long as V is convex on the open sets Sat 
and £^r, respectively. The argument in the proof of the main Theorem 12.11 remains unchanged, but 
the technical details of the regularization of the singular dynamics (Appendix [Bj becomes more 
involved. We do not pursue this direction here since we do not need it for the application for 
Wigner and sample covariance matrices. 

Assumption II. There exists a continuous, compactly supported density function g(x) > 0, 
L q = 1, on the real line, independent of TV, such that for any fixed a, b £ R 



lim sup 

N-too t>0 



f 1 N f b 

J G [°' 6 ])/*( x ) d ^( x ) - J Q(x)dx 



0. (2.11) 



Let jj = 7j 5 Ar denote the location of the j'-th point under the limiting density, i.e., jj is defined 

by 

pi 

N j g(x)dx = j, 1 < j < N, 7j G supp^. (2.12) 

>/ — oo 
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We will call jj the classical location of the j-th. point. Note that jj may not be uniquely defined 
if the support of g is not connected but in this case the next Assumption III will not be satisfied 
anyway. 

Assumption III. There exists an e > such that 

N 



sup 

t>N~ 2s 



J Jf |>i " 7i) 2 /*(dx)/,(dx) < CN~^ (2 13) 



with a constant C uniformly in N. 

Under Assumption II, the typical spacing between neighboring points is of order 1 /N away from 
the spectral edges, i.e., in the vicinity of any energy E with g(E) > 0. Assumption III guarantees 
that typically the random points Xj remain in the N~ l ' 2 ~ £ vicinity of their classical location. 

The final assumption is an upper bound on the local density. For any J £ R, let 

N 

AA /: =^l(x 4 G/) 

i=l 

denote the number of points in I. 

Assumption IV. For any compact subinterval 1$ C {E : g(E) > 0}, and for any 5 > 0, 
a > there are constants C n , n S N, depending on Iq, and a such that for any interval / C Iq with 
|/| > jV _1+<J and for any K > 1, we have 

sup > KN\I\}f T dfi < C n K~ n , n = l,2,..., (2.14) 

where e is the exponent from Assumption III. 

The main general theorem is the following: 

Theorem 2.1 Suppose that the Hamiltonian given in (|2.6p or (|2.8p satisfy Assumption I or F, 
respectively. Suppose that Assumptions II, III and IV hold for the solution ft of the forward equation 
(|2.ip . Assume that at time to = N~ 2e we have S fl (ft ) '■= f ft log ft dfi < CN m with some fixed 
exponent m that may depend on e. Let BeM and b > such that min{^(x) : x £ [E — b, E + b]} > 
0. Then for any 5 > 0, e' > 0, for any integer n > 1 and for any compactly supported continuous 
test function O : W 1 — > R, we have, 

f E+b dE' f 1 
SU P / / dai...da n O(ai,...,Q n )— — — 
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for t = N 2e+<5 where e > is the exponent from Assumption III. 

Suppose in addition to the Assumption I-IV, that there exists an A > such that, for any c' > 

P( sup \ Xj - 7i | > N- 1+A ) < CN- cloglosN (2.16) 

^ c'N<j<{l-c')N ' 
for some constants c and C only depending on c' . Then for r = N~ 2e+S we have 

rE+b 



f^ +0 dE' f 1 

s sL b -2bL dai --- dan0{au -- an) wr 



(2.17) 



This theorem shows that the local statistics of the points Xj in the bulk with respect to the 
time evolved distribution /( coincides with the local statistics with respect to the equilibrium 
distribution \i as long as t ^> N~ 2e . In many applications, the local equilibrium statistics can be 
explicitly computed and in the b — > limit it becomes independent of E, in particular this is the 
case for the classical matrix ensembles (see next section). The restriction on the time t 3> N~ 2e 
will be removed by the reverse heat flow argument (see Section [6]) for matrix ensembles. 

Since the eigenvalues fluctuate at least on a scale 1/N, the best possible exponent in Assumption 
III is 2e ~ 1, but we will only be able to prove it for some e > for the ensembles considered in 
this paper. Similarly, the optimal exponent in (12. 16H is A ~ 0. If we use these optimal estimates, 
2e ~ 1, A ~ 0, and we choose <5 = 2e ~ 1, thus r ~ 1, then we can choose b ~ iV -1 , i.e., we obtain 
the universality with essentially no averaging in E. On the other hand, the error estimate is the 
strongest, of order ~ iV -1 / 2 , for an averaging on an energy window of size b ~ 1. These errors 
become weaker if time r is reduced. These considerations are not important in this paper, but will 
be useful when good estimates on e and A can be obtained. 

Convention: Throughout the paper the letters C, c denote positive constants whose values may 
change from line to line and they are independent of the relevant parameters. Since we will always 
take the TV — > oo limit at the end, all estimates are understood for sufficiently large N. 



3 Universality for Matrix Ensembles 

Now we specialize Theorem 12.11 to Wigner and sample covariance matrices with i.i.d. entries. In 
the next sections we give the precise definitions of these ensembles; formulas for the equilibrium 
measure and the dynamics will be deferred until Section [5j 

In order to apply Theorem 12.11 to Wigner and sample covariance matrices, we need to check 
that Assumptions I-IV are satisfied for these ensembles. Assumptions I or I' are satisfied by the 
definition of the Hamiltonian, the precise formulas are given in Section [5j Assumption II is satisfied 
since the density of eigenvalues is given by the Wigner semicircle law (|3.7p for Wigner matrices [13] . 
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In case of the sample covariance matrices, the singular values of A will play the role of Xj's and 
their density is given by the Marchenko-Pastur law (|3.14|) after an obvious transformation (|3.15p 
[29]. In fact, in Section [8] we prove a local version of the Marchenko-Pastur law in analogy with 
our previous work on the local semicircle law for Wigner matrices |17| [TH] . In Section [9] (Theorem 
I9.1|) we will show that Assumption III is satisfied for these ensembles (more precisely, we will prove 
that Assumption III is satisfied for sample covariance matrices; the proof for Wigner matrices is 
analogous, and will not be given in details). Assumption IV will be proved in Lemma 18. II for the 
sample covariance matrices, for Wigner matrices the proof was given, e.g., in Theorem 4.6 of [19] . 
We remark that the assumption that the matrix entries are identically distributed, will only be used 
in checking Assumptions III and IV. Assumption II holds under much more general conditions on 
the matrix entries. Finally, the apriori estimate on the entropy Su(ft ) follows from the smoothing 
property of the OU-flow (see Section [5]). 



3.1 Definition of the Wigner matrix 

To fix the notation, we assume that in the case of real symmetric matrices, the matrix elements of 
H are given by 

htk = hM := N-^xtk, k < £, (3.1) 

where xik for i < k are independent, identically distributed real random variables with distribution 
v that has zero expectation and variance 1. The diagonal elements are hkk = N~ l l 2 Xkk-> where 
Xkk are also i.i.d. with distribution v that has zero expectation and variance 2. The eigenvalues 
of H will be denoted by x\ < X2 < ■ ■ ■ < xjy- We will always assume that the distribution v is 
continuous hence the eigenvalues are simple with probability one. 
In the hermitian case we assume that 

htk = hkl ■= N~ 1/2 (x ek + iy ik ), k < £, (3.2) 

where x^ and yek are real i.i.d. random variables distributed with the law v with zero expectation 
and variance |. The diagonal elements hkk are real, centered and they have variance one with law 
v. The eigenvalues of H are again denoted by x\ < xi < . . . < x^. 

Finally, for the quaternion self- dual case we assume that H is a 2N by 2N complex matrix that 
can be viewed as an IV x JV matrix with elements consisting of 2 x 2 blocks of the form 

( Z - fi, (3.3) 

where z = a + bi,w = c + di are arbitrary complex numbers, a, b,c,d G M. Such a 2 by 2 matrix can 
be identified with the quaternion q = a + bi + cj + dk £ EI if the quaternion basis elements i, j, k 
are identified with the standard Pauli matrices 

i=ici3= (o -i)> j=m2= (-°i i)' k=mi= (? o)- 
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The complex numbers z G C can be naturally identified with diagonal quaternions via the identifi 
cation 

'z 



o iy < 3 ' 4 > 

The dual of the quaternion q is defined to be g + := a — bi — cj — dk. which corresponds to the 
hermitian conjugate of the matrix (|3.3p . 

Using this identification, H can be viewed as an N x TV matrix with quaternion entries. The 
matrix H is quaternion self-dual if its entries satisfy htk = ht», in particular, the diagonal elements 
hkk are real. We assume that the offdiagonal elements of H are given (in the quaternion notation) 
by 

htk = Ke '■= N~ 1/2 (xtk + Wek + j% + ku ik ), 1 < k < £ < N (3.5) 

where xik, Hek, z tk and ugk are real i.i.d. random variables with law v that has zero expectation 
and variance \. The diagonal entries are real, 

hkk = N~ 1/2 Xkk, 1 < k < N, 

where Xkk has a law v with zero expectation and variance \ . The spectrum of H is doubly degenerate 
and we will neglect this degeneracy, i.e., we consider only TV real (typically distinct) eigenvalues, 

X\ < X2 < ■ ■ ■ < Xn. 

The Gaussian ensembles (GOE, GUE and GSE) are special Wigner ensembles with v and v 
being Gaussian distribution. These ensembles are invariant under their corresponding symmetry 
group, i.e., the distribution remains unchanged under the conjugation H — > UHU*. Here U is an 
arbitrary orthogonal matrix in case of GOE, it is a unitary matrix for GUE and it is a unitary 
matrix over the quaternions in case of GSE. In the latter case, if one uses the (2N) x (2N) complex 
matrix representation, then the symmetry group is Sp(iV) = Sp(iV, C) n SU(2N). 

With the given normalization, the eigenvalues are supported asymptotically in [—2, 2] in all 
three cases, Moreover their empirical density converges weakly to the Wigner semicircle law in 
probability [33], i.e., for any J G Co(M) and for any e > 0, we have 

_L ^ J(xj) - f J{x)q sc (x)Ax\ > e\ = 0, (3.6) 




where 



1 



Qsc(x) :=— V(4-x2) + . (3.7) 

In particular, the typical spacing between neighboring eigenvalues is of order 1/N in the bulk of 
the spectrum. 

We will often need to assume that the distributions v and v have Gaussian decay, i.e., there 
exists <5o > such that 

/ exp [(5o2 2 ]d^(x) < oo, / exp [<5o£ 2 ] dz7(x) < oo. (3-8) 
it Jm. 
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In several statements we can relax this condition to assuming only subexponential decay, i.e., that 
there exists 5q > and 7 > such that 

s °\ x P&v{x) < 00, I e W du(x) < 00. (3.9) 



For some statements we will need to assume that the measures v, v satisfy the logarithmic Sobolev 
inequality, i.e., for any density h > with J h&v = 1 it holds that 

J h\oghdv<C j \V^/h\ 2 du (3.10) 

and a similar bound holds for v. We remark that (|3.10p implies (|3.8|) . see, e.g. |28| . 

3.2 Sample Covariance Matrix 

The real sample covariance matrix ensemble consists of symmetric N x N matrices of the form 
H = A* A. Here A is an M x N real matrix with d = N/M fixed and we assume that < d < 1. 
The elements of A are given by 

A lk = M~ x l 2 x lk , 1<£<M, 1 < k < N, (3.11) 

where X£ k are real i.i.d random variables with the distribution v that is symmetric and has variance 
1. In the case of complex sample covariance ensemble we assume that 

Mk = M" 1 / 2 (xa + iyek) , 1 < I < l<A:<iV, (3.12) 

where xg k and yik are symmetric, real i.i.d. random variables with distribution v that has variance 
\. We will assume that v has Gaussian (|3.8p or sometimes only subexponential (|3.9h decay. The 
spectrum of H asymptotically lies in the interval [A_, A+], where 

A± = (lid 1 / 2 ) 2 . (3.13) 



Moreover, analogously to ()3.6j) . the empirical density of eigenvalues converges weakly in probability 
to the Marchenko-Pastur law 



1 /[(A+-x)(x-A_)] 

pw(*)=m)I — ^ — ± - (3 - i4) 

Most of the analysis will be done for the singular values of A that are denoted by x = (xi, . . . ,xn). 
They are supported asymptotically in [yX_, \/A+] and therefore the typical spacing between neigh- 
boring singular values is of order 1/N. Their empirical density converges to 



1 [(A+-x 2 )(x 2 -A_) 
Qw{x) := 2xg w (x ) = — Y . (3.15) 
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We remark that the assumption that v is symmetric is used only at one technical step, namely 
when we refer to the large deviation result for the extreme eigenvalues of the sample covariance 
matrices in [22] (see Lemma 19.21 below) . The similar result for Wigner matrices has been proven 
without the symmetry condition, see Theorem 1.4 in [42j . 

3.3 Main Theorems 

With the remarks at the beginning of Section [31 Theorem 1 2 . 1 1 applies directly to prove universality 
for Wigner and sample covariance ensembles with a small Gaussian component; we will not state 
these theorems separately. To remove the small time restriction from Theorem 12.11 we will apply 
the reverse heat flow argument. This will give our main result: 

Theorem 3.1 Consider an N x N symmetric, hermitian or quaternion self-dual Wigner matrix 
H , or an N x N real or complex sample covariance matrix A* A. Assume that the single site 
entries of H or A are i.i.d. with probability distribution v(dx) = uo(x)dx and with the standard 
normalization specified in Sections 1 3. 1 1 and 1 3. "A We assume that v satisfies the logarithmic Sobolev 
inequality (|3.10j) and in case of the sample covariance matrix we also assume that v is symmetric. 
The same conditions are assumed for the distribution v of the diagonal elements in case of the 
Wigner matrix. Let /o = /o,jv denote the joint density function of the eigenvalues and let p^ N 
be the k-point correlation function of f$. Let g denote the corresponding density of states, i.e., g 
is given by the Wigner semicircle law (|3.7j) or the Marchenko-Pastur law (|3.14j) . respectively. Let 
E G IR, b > such that min{g(x) : x E [E — b, E + b]} > 0. If for any k > 1 there is a constants 
Mfc such that the density function uq satisfies 



for some constants Ck < oo, then for any compactly supported continuous test function O : R — > M 
we have 

l-E+b (■ 



Here [i denotes the probability measure of the eigenvalues of the appropriate Gaussian ensemble, 
i.e. GUE, GOE, GSE for the case of hermitian, symmetric, and, respectively, quaternion self-dual 
Wigner matrices; and the ensembles of real or complex sample covariance matrices with Gaussian 
entries (Wishart ensemble) in case of the covariance matrices A* A. These measures are given in 
U.6\) . with (3 = 1,2,4, for Wigner matrices, and, expressed in terms of singular values, in \5. 6\) . 
with P = 1,2, for sample covariance matrices. 




(3.16) 



j=0 




(3.17) 
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Remark 1.: In the case of symmetric and hermitian Wigner matrices, the condition (|3.16p 
can be removed by applying the Four-moment theorem of Tao and Vu (Theorem 15 of [39]) as in 
the proof of Corollary 2.4 of [20] . Similar remark applies to the sample covariance ensembles and 
to the quaternion self-dual Wigner ensemble provided the corresponding Four-moment theorem is 
established. 

We also remark that a manuscript by Ben-Arous and Peche [1] with a similar statement is in 
preparation for complex sample covariance matrices that holds for a fixed E 1 , i.e., without averaging 
over the energy parameter in (|3.17p . 

Remark 2.: After the first version of this manuscript was posted on the arxiv, the question 
that whether the four moment theorem for sample covariance matrices holds was settled in |41j . 
In particular, |41j gives an alternative proof of the universality of local statistics for the complex 
sample covariance ensemble when combined with the result of [3]. For the real sample covariance 
ensemble the universality was established for distributions whose first four moments match the 
standard Gaussian variable. An important common ingredient to both our approach and that of 
|41j is the local Marchenko-Pastur law, established in Proposition 18.1} a slightly different version 
suitable for the application to prove the four moment theorem is proved in [41] , 

The four moment theorem in [41] compares the distributions of individual eigenvalues for two 
different ensembles. For our application to the correlation functions and gap distributions, an 
alternative approach is to use the recent Green function comparison theorem [21]. This will also 
remove the smoothness and logarithmic Sobolev inequality restrictions in Theorem 13.11 

We now state our result concerning the eigenvalue gap distribution both for Wigner and sample 
covariance ensembles. For any s > and E with p{E) > we define the density of eigenvalue pairs 
with distance less than s/N q{E) in the vicinity of E by 

A(£;s) = j - N - l] xs+1 ~ x > - WWr ~ El - < 3J8 » 

where £n = for some < 5 <C 1. 

Theorem 3.2 Consider an N x N Wigner or sample covariance matrix as in Theorem \ct.l\ such 
that the probability measure du = uq&x of the matrix elements satisfies the logarithmic Sobolev 
inequality (|3.10p and, additionally, v is symmetric in the sample covariance matrix case. Suppose 
that the initial density uq satisfies 

M 

]T|d>guo(x)|<C(l + |*|) c (3.19) 

j=0 

with some sufficiently large constants C, M that depend on the e in Assumption III. Then for any 
E with p(E) > and for any continuous, compactly supported test function O : ]R — > M we have 

lim [ dsO(s)[EA(E;s)-&A{E;s)] = 0, (3.20) 

N^ooJ R 
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where fx is the probability measure of the eigenvalues of the appropriate Gaussian ensemble, as in 
Theorem I3.il 

Theorem 13.21 shows that, in particular, the probability to find no eigenvalue in the interval 
[E, E + a/(g(E)N)] is asymptotically the same as in the corresponding classical Gaussian ensemble. 
Theorems 13.11 and 13.21 will follow from Theorem 12.11 and the reverse heat flow argument that we 
present in Section El We remark that the additional condition on the symmetry of v in the case 
of sample covariance matrices stems from using a result from [22 j on the lowest eigenvalue of these 
matrices, see Lemma 19.21 

Theorem 13.21 can be proven directly from Theorem 14. II since the test functions of the form 

-ieJ 

determine the distribution of the random variable A(E; s) uniquely. Here we take J to be the set 

J:= {i : 7i G [E - t N , E + l N }} , 

where ji was defined in (|2.12|) . Notice that 5 in the definition of In has to be small enough so that 
the edge term near the boundary of the interval is negligible. 



4 Local Relaxation Flow 



Theorem 4.1 (Universality of Dyson Brownian Motion for Short Time) Suppose that the 
Hamiltonian % given in (j2.6j) satisfies the convexity bound (12. 10f) with f3 > 1. Let ft be the solution 
of the forward equation (|2.ip with an initial density fo- Fix a positive e > 0, set to = N~ 2e and 
define 

Q:=supV [(xj-fjfftdn. (4.1) 
t>t j J 

Assume that at time to we have Su(ft ) '■= J ft log /t d/i < CN m with some fixed exponent m that 
may depend on e. Fix n > 1 and an array of positive integers, m = (mi,m2, • • • ,m n ) £ N™. Let 
G : M n — > M be a bounded smooth function with compact support and define 

C/i,m(x) := G^N(xi - x i+mi ), N(x i+mi -x i+m2 ),...,N(x i+mn _ 1 - sc i+mn )V (4.2) 

Then for any sufficiently small e' > 0, there exist constants C, c, depending only on e' and G such 
that for any J C {1, 2, . . . , N — m n } and for any r > 3to = 3N~ 2e , we have 

[ ^^^, m (x)/ r d/i- / l^^ m ( x )d/x| < CN s 'V\J\Q(rN)-i + Ce- cNe ', (4.3) 

•J Ar- 7 Ar- T 



ieJ 



where \J\ is the number of the elements in J. 
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The proof of this theorem is similar but much simpler than that of Theorem 2.1 of [20]. The 
estimate (|4.3j) improves slightly over the similar estimate in [20] by a factor \J\/N due to the 
improvement in (|4.19p . Theorem 12.11 will follow from the fact that in case r > N- 2e+s , the 
assumption f)2 . 13j) guarantees that 

N £ ' ^J\J\Q{tN)- 1 < N £ '~ 5 / 2 = N~ 5 / 6 -> 

with the choice e' = 5/3 and using \J\ < N. More precise error bound will be obtained by relating 
b to | J\. Therefore the local statistics of observables involving eigenvalue differences coincide in the 
N — > oo limit. To complete the proof of Theorem 12. 1[ we will have to show that the convergence 
of the observables Qi )m is sufficient to identify the correlation functions of the Xi's in the sense 
prescribed in Theorem 12. 1L The details will be given in Section 

Proof of Theorem \4.1\ Without loss of generality we can assume in the sequel that /o £ L°°(d/i). 
To see this, note that any /o £ L 1 (d/i) can be approximated by a sequence of bounded functions 
/q in L 1 -norm with arbitrary precision and the dynamics is a contraction in L 1 (see Appendix 

|A|) . thus f T and f^ are arbitrarily close in L 1 . Since G is bounded on the left hand side of (|4.3p . 
this is sufficient to pass to the limit k — > oo. 

Every constant in this proof depends on e 1 and G, and we will not follow the precise dependence. 
We can assume that e' < e. Given r > 0, we define 

R : = T ^N- S '/ 2 . (4.4) 

Notice that the choice of R depending on r which is the main reason that r appears in the de- 
nominator on the right hand side of (|4.3p . We now introduce the pseudo equilibrium measure, 
ujn = u) = tpfJ-N, defined by 

Z N 1 

^= exp(-NW), W(x) = Y,W j (x j ), Wj (x) = — i (x ] - lj f, 

Z j=1 lti 

where Z is chosen such that a; is a probability measure, in particular ui = e~ N ^/Z with 

U = U + W. (4.5) 

Similarly to (|2.9p . one can check that 

| log Z\ < CN m (4.6) 

with some exponent m. 

Note that the additional term Wj confines the j'-th point Xj near its classical location. We will 
prove that the probability w.r.t. the equilibrium measure fijy of the event that Xj near its classical 
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location is very close to 1. Thus there is little difference between the two measures ojn and hn 
and in fact, we will prove that their local statistics are identical in the limit N — > oo. The main 
advantage of the pseudo equilibrium measure comes from the fact that it has a faster decay to 
global equilibrium as shown in Theorem 14.21 

The local relaxation flow is defined to be the reversible dynamics w.r.t. w. The dynamics is 
described by the generator L defined by 

/ fLgdu = --L W djfdjgduj. (4.7) 

3 J 

Explicitly, L is given by 

L = L + ^b j 8 j , h, ir;(.r/) ^-X (4.8) 



Since the additional potential Wj is uniformly convex with 



inf inf W'Hx) > R- 2 , (4.9) 



by (123UD and f3 > 1 we have 



v.V^^W^iy:^, v 6R » ,4,0) 

Here we have used U" > in the last estimate. If this assumption is replaced by 

U" > -M (4.11) 

for some constant M independent of N, then there will be an extra term — M||v|| 2 in (|4.10|) . 
Assuming r < N £ ' , we have R < N~ £ '/ 2 , then this extra term can be controlled by the R~ 2 term 
and the same proof will go through. Since for the applications in this paper, the condition U" > 
is satisfied, we will not use this remark here. 

The R~ 2 in the first term comes from the additional convexity of the local interaction and it 
enhances the "local Dirichlet form dissipation" . In particular we have the uniform lower bound 

X7 2 H = Hess(-logw) > R~ 2 . (4.12) 

This guarantees that the relaxation time to equilibrium co for the L dynamics is bounded above by 
CR 2 . We recall the definition of the relative entropy of / with respect to any probability measure 
dA 

Sx(f) = I /log/dA, Sx(M = I /log(//V)dA. 
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The first ingredient to prove Theorem 14.11 is the analysis of the local relaxation flow which 
satisfies the logarithmic Sobolev inequality and the following dissipation estimate. Its proof follows 
the standard argument in [2] (used in this context in Section 5.1 of |16j). In Appendix [B] we will 
explain how to extend this argument onto the subdomain S^r. Here we only remark that the key 
inputs are the convexity bounds (|4.1Ul I4.12p on the Hessian of % ([4.101) • 

Theorem 4.2 Suppose (|4.10p holds. Consider the forward equation 

d t q t = Lq t , t > 0, (4.13) 

with an initial condition q$ and with the reversible measure u. Assume that qo £ L°°(dw). Then 
we have the following estimates 

and the logarithmic Sobolev inequality 

s u (q) < cr 2 d w (Vq) ( 4 -!6) 

with a universal constant C. Thus the time to equilibrium is of order R 2 : 

SM < e- Ct l R2 S^q Q ). (4.17) 



□ 



The estimate (|4,15p on the second term in ()4.10p plays a key role in the next theorem. 



Theorem 4.3 Suppose that Assumption I holds and we have a density q € L°°, J qdoj = 1. Recall 
that t = R?N £ . Fix n > 1, m € A/T, let G : W 1 — >■ K be a bounded smooth function with compact 
support and recall the definition of Qi >m from (14. 2h . Then for any J C {1, 2, . . . , N — n} we have 

I 4 ££, m (x)dc - / i]>>, m (*H < g( ' J| y )T ) 1/2 + Ce-^'^SM). (4.18) 

Proof. For simplicity, we will consider the case when m = (1, 2, ... n), the general case easily 
follows by appropriately redefining the function G. Let q t satisfy 

d t q t = Lq t , t > 0, 
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with an initial condition q. Thanks to the exponential decay of the entropy on time scale r>i? 2 , 
see (|4.17p . and the entropy bound on the initial state q, the difference between the local statistics 
w.r.t. q T u) and qooUJ = uj is subexponentially small in N, 

I jr ^^i, m (x)g T da; - / -^r^£i,m(x)<7ooda; <||G||oo / |? T - l|du> 



giving the second term on the r.h.s. of (14,18p . To compare q with q T , by differentiation, we have 

J YlGi,™(*)<iTdu) - J -j^Yl &,m(x)gdw 



= / ds / ^^^dkG^Nixi-Xi+ij^.^Nixi+n^-Xi+n^di+k^iqs-di+kqsjduj. 

Here we used the definition of L from (|4.7p and note that the 1/N factor present in (|4.7p cancels 
the factor JV from the argument of G (|4.2p . From the Schwarz inequality and dq = 2^qd^/q, the 
last term is bounded by 

1/2 



2 E 

fc=i 



1/2 



ieJ 



d < / 



N2 j^j ( X i+k-l ~ x i+k) [ 



■[d i+k - ly /q~ s - d i+k y/<h} 2 duj 



< C 



\J\DUVvY> 



N 2 



(4.19) 



where we have used (|4.15p and that 

d k G^N(x.i - x i+ i), N(x i+k -i - x i+k ), . . . N(x i+n -i - x i+n )) (x i+k -i - x i+k ) 2 < CN~ 
since G is smooth and compactly supported. This proves Theorem 14.31 



□ 



As a comparison to Theorem 14.31 we state the following result which can be proved in a similar 
way. 

Lemma 4.4 Let G : R — > R be a bounded smooth function with compact support and let a sequence 
Ei be fixed. Then we have 



1 



1 



N 



- ]T / G(N( Xi - Ei))dLO - - £ I G{N( Xl - Ei))qdcj < C(S UJ (q)r) + Ce~ cNe ^KXq)- 



1/2 



(4.20) 
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Notice that by exploiting the local Dirichlet form dissipation coming from the second term on 
the r.h.s. of (|4.14p . we have gained the crucial factor N~ 1 / 2 in the estimate (|4.18j) compared with 
(ODD . 

The final ingredient to prove Theorem 14, II is the following entropy and Dirichlet form estimates. 

Theorem 4.5 Suppose that (|2.10p holds and recall r = R 2 N £ ' > 3io with t = N~ 2e . Let g t = f t /ip 
so that Sfj,(ft\il>) = S w (gt). Assume that S^(ft ) < CN m with some fixed m. Then the entropy and 
the Dirichlet form satisfy the estimates: 

S u (9t/2) < CNR~ 2 Q, D u (y/£) < CNR~ 4 Q. (4.21) 

Proof. Recall that dtft = Lft- The standard estimate on the entropy of ft with respect to the 
invariant measure is obtained by differentiating the entropy twice and using the logarithmic Sobolev 
inequality. The entropy and the Dirichlet form in (|4.2ip are, however, computed with respect to 
the measure u). This yields the additional second term in the following identity [H] that holds for 
any probability density tpt- 

dtS^ft^t) = E / (fyVto) 2 ^ d ^ + / 9t{L - d t )ip t dfi , 
. J J 

where gt = ft/ipt- m our application we set ipt to be time independent, ipt = ip = uj/ hence we 
have 

j j 
Since uj is invariant, the middle term on the right hand side vanishes, and from the Schwarz 
inequality 

dtSUgt) < -D u (y/&) + CN^j tfgt dco < -D u (y/gd + CNA, t > N~ 2e , (4.22) 



where we defined 



A := QR' 4 = sup iT 4 V f ( Xj - Jj) 2 f t dfi. (4.23) 



t>N~ 

Together with (|4.16p . we have 

dtSUgt) < -CRr 2 SM + CNA, t > N~ 2e . (4.24) 

To obtain the first inequality in (|4.21j) . we integrate (|4.24j) from to = N~ 2e to t/2, using that 
r = R 2 N e and S u (gt ) < CN m + N 2 Q with some finite m, depending on e. This apriori bound 
follows from 

SUgt ) = S^fM = S„(f t0 ) - \ogZ + \ogZ + N j f t0 Wdfi < CN m + N 2 Q, (4.25) 
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where we used (|2.9p and (|4.6p . The second inequality in (|4.2ip can be obtained from the first one 
by integrating (|4.22p from t = r/2 to t = r and using the monotonicity of the Dirichlet form in 
time. □ 

Finally, we complete the proof of Theorem 14.11 Recall that r = R 2 N £ ' and to = N~ 2e . Choose 
It '■= 9t = fr/tp as density q in Theorem 14.31 The condition q T £ L°° can be guaranteed by 
the approximation argument from the beginning of the proof of Theorem 14.11 Then Theorem 14.51 
Theorem 14.31 together with (|4.25p and the fact that At = Qt" 1 ^ 6 directly imply that 

/ 4 J2 Si^nfrdfi ~ fjrY, Gi ^ duj | ^ C ^ ^AAQFW 1 + Ce^ , (4.26) 

i.e., the local statistics of f T fi and oj can be compared. Clearly, equation ()4.26p also holds for the 
special choice /o = 1 (for which f T = 1), i.e., local statistics of fi and oj can also be compared. This 
completes the proof of Theorem 14.11 rj 



5 Equilibrium measure and Dyson Brownian motion 

We will treat the Wigner and sample covariance ensembles in parallel. Suppose (x±, X2, ■ ■ ■ , xtv) de- 
note the eigenvalues of the Gaussian Wigner ensembles. The joint distribution of x = (x\,X2, ■ ■ ■ ,Xn) £ 
of the Gaussian Wigner ensembles is given by the following measure on 



e -NHp(*.) 

V = M/?,iv(dx) = dx, Upi*) = P 



N 



i=l 



i<j 



(5.1) 



where (3 > 1 is an arbitrary parameter, i.e., this corresponds to choosing U(x) = x 2 /4 in (|2.6D . 
With a slight abuse of notations we will use \x for both the measure d\i and its density e - ^^ /Zp 
with respect to the Lebesgue measure. The specific value /3 = 1,2,4 correspond to the GUE, GOE 
and GSE ensembles, respectively. 
We define the following generator 



l = l 0N = Y— df+py ( -- x , l + — y _J_) 

i=l 1=1 \ jjti J / 



di 



(5.2) 



acting on L 2 (n). The measure fi is invariant and reversible with respect to the dynamics generated 
by L. Define the Dirichlet form and entropy by 

N 

D(f) = D^f) = - ff L fdv = J2^ [(fyf) 2 ^, and S(f) = S^f) := J f log fdfi (5.3) 
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Let /td/i denote the probability measure on the set S^v at the time t with the given generator 
L. Then ft satisfies the forward equation 



dtft = Lf t 



(5.4) 



with initial condition Jq. This dynamics is the Dyson Brownian motion. 

The Dyson Brownian motion is the corresponding system of stochastic differential equations for 
the vector x(t) that is given by 



dxi 



dB,: 



N 



i 

2N 



-J2 



dt, 



l<i<N, 



(5.5) 



where {Bi : 1 < % < N} is a collection of independent standard Brownian motions on R. This 
SDE is well posed for /? > 1, and in particular the points do not cross each other with probability 
one, i.e., the process is well defined on Y,n (see, e.g. Section 12.1 of |25j) 



The treatment of the sample covariance ensembles is fully analogous, but the formulas change 
slightly. We use the convention in the sample covariance case that X{ denotes the singular values 
of A and Aj = xf are the eigenvalues of A* A. Most of the formulas will be in terms of Xj's; in 
particular we consider the joint distribution function /o(x) of the singular values. The invariant 
measure for the singular values is given by (c.f. (|5.ip ): 



.w 



-dx. 



(5.6) 



where 



<(x) = /3 



N o 
X 



1 



i<j 



N 



N 



i=l 



where d = M/N and (3 = 1 when X is a real matrix, f3 = 2 when X is a complex matrix. This 
formula can be obtained by direct calculation (see also Proposition 2.16 of [23] or Fig. 1 of |11| 
after appropriate rescaling). Define the generator (c.f. (|5.2p ) 



L 



w 



N N 



i=l 



i=l 



/3xj 
2d 



I 
N 



p- 1\ i 



|^. (5.7) 



Finally, the stochastic differential equation is given by (c.f. (|5.5p ) 



dxi 



dBi 



+ 



2d 2N 



+ 



N 



1\ 1 



dt, 



l<i<N. 
(5.8) 
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In applications to Wigner matrices (/3 = 1,2,4), fod/j, will be the joint probability density of 
the eigenvalues of the initial hermitian, symmetric or quaternion self-dual Wigner matrix H. The 
limiting density is the Wigner semicircle law given in (|3.7p . The Dyson Brownian motion describes 
the eigenvalues of the matrix valued process 

dB 1 

dH t = -JL - -H t dt (5.9) 

with Hq = H. Here B t is a symmetric, hermitian or quaternion self-dual matrix-valued process 
whose offdiagonal elements are standard real, complex or quaternion Brownian motions with vari- 
ance one and the diagonal elements of are real Brownian motions with variance 2, 1 and ^, in case 
B = 1, 2, 4, respectively. More precisely, let ut denote the density function of the distribution of one 
real component of the (ij)-th entry of Ht, i < j (there are two real components for the hermitian 
matrices and four for the quaternion matrices), then 

1 d 2 Bx d , 

*«. = »*. B = 2ft?-T&r (5 - 10 > 

Let 7(dx) = 7(x)dx := [fi /2Tr) l / 2 e~^ x ' 'l 2 dx denote the reversible measure for this process. The 
diagonal elements evolve according to an OU process with twice variance. For any t > 0, the 
solution to (15.91) . Ht, has the same distribution as 



e-*/2jf + (i_ e -*) 1/2 V; (5.11) 

where V is a GUE, GOE or GSE matrix. 

The generator of the induced stochastic process on the eigenvalues is given by (15. 2D . The 
equilibrium measure \x is the GUE, GOE or GSE eigenvalue distribution. Theorem 12.11 thus says 
in this case that the local eigenvalue statistics of a Wigner random matrix with a small Gaussian 
component coincides with the local statistics of the corresponding Gaussian ensemble. The entropy 
condition on S^(ft ) in Theorem 12. II can be easily obtained by 

S„(f t0 ) < iV 2 5 7 (u t0 ) < CN m . (5.12) 

In the real or complex sample covariance case (/3 = 1,2), the matrix elements of A evolve according 
to the OU process (|5.10p . i.e. At has the same distribution as 

e-*/2ji + (i_ e -t)V2 W ; ( 5 . 13 ) 

where W is an M x N matrix whose elements are i.i.d real or complex Gaussian variables with 
mean and variance 1//3. 
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6 Reverse heat flow 



To remove the short time restriction from Theorem 12.11 in case of Wigner and sample covariance 
ensembles and to prove Theorems 13.11 and 13.21 we apply the reverse heat flow argument, presented 
first in p3] and used also in Corollary 2.4 of |20| . 

For fixed (3 = 1,2 or 4, recall the Ornstein-Uhlenbeck process from (|5.10p with the reversible 
Gaussian measure -f(dx). Let u be a positive density with respect to 7, i.e., J ud'y = 1 and we 
write u{x) = exp(— V(x)). Suppose that for any K fixed there are constants C\,C2 depending on 
K such that 

2K 

J2\ vij) (x)\ < C^l + x 2 ) 02 (6-1) 
3=1 

and the measure dv = ud-y satisfies the subexponential decay condition. We will apply this for the 
initial distribution dv = uo(x)dx, so u and uq differ by a Gaussian factor. 



Proposition 6.1 Suppose that v = try satisfies the subexponential decay condition and (|6.ip for 
some K . Then there is a small constant ax depending on K such that for t < olk there exists a 
probability density gt with mean zero and variance ^ such that 

\e tB g t - u\d~i < C t K (6.2) 

for some C > depending on K. Furthermore, gt can be chosen such that if the logarithmic Sobolev 
inequality (I3.10P holds for the measure v = wy, then it holds for ^7 as well, with the logarithmic 
Sobolev constant changing by a factor of at most 2. 

Furthermore, let B = B® n , F = n® n with some n < CN 2 . Denote by G t = gf n . Then we also 
have 

J \e w G t - F\ d 7 ® n < C N 2 t K (6.3) 

for some C > depending on K . 

We now explain how to prove Theorems 13. II and 13.21 from Theorem 12 . 1 1 and Proposition [6TTJ We 
choose n to be the number of independent OU processes needed to generate the flow of the matrix 
elements. By choosing K large enough, we can compare the two measures e tB Gt and F in the total 
variational norm; for any observable J : R n — > R of the matrix elements, we have 



J J(e w G t - F)d 7 ' 



< HJIUC N 2 t K . 



In order to prove Theorems 13.11 and 13.21 appropriate observables J need to be chosen that depend 
on the matrix elements via the eigenvalues to express the quantities in (|3.17p and (|3.20p . It is easy 
to see that || J||oo may grow at most polynomially in N . But we can always choose K large enough 
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to compensate for it with the choice t = N~ 2e+S allowed in Theorem 12.11 Here the verifications of 
the Assumptions I- IV of Theorem 12.11 were explained at the beginning of Section [3l This completes 
the proof of our main theorems. rj 

Proof of Proposition \6.1\ Define 9{x) = 6o(t a x) with some small positive a > depending on 
K, where 6q is a smooth cutoff function satisfying 6q(x) = 1 for \x\ < 1 and 6q(x) = for |x| > 2. 
Set 



h s = u + 6i s , with £ s := 
By assumption (|6. 1 1) . h s is positive and 



-K-\ 



2 , 3 
-u<h s < -u. 



D 



K-l 



U. 



for any s < t if t is small enough. To see this, take, e.g., K = 2 and we have 

\6{x)£s{x)\ < Cs6 (t a x) \V"(x)\ + \xV'(x)\ \u(x)\ < -\u(x)\, 



where we have used a <C 1, s < t and the assumption (|6.1 
Define v s = e sB h s and by definition, vq = u. Then 



d s v s = (-1) 



K-l 



7 e sB B K u + e sB B( 



l)^ s + e sB (e-l)d s ^ s . 



(K-l)\ 

Since the Ornstein-Uhlenbeck is a contraction in L 1 (d7), together with (I6.1j) . we have 



(6.4) 



\v t - u\dj < C K 



t^-^ul + \B(0 - l)Cs\ + \(6- l)d s £ s \ d 7 ds < C K t 



A 



(6.5) 



for sufficiently small t. To estimate the last two terms, we also used that on the support of 9 — 1 
the measure d 7 decays sub exponentially in t. 

Notice that ht may not be normalized as a probability density w.r.t. 7 but this can be easily 
adjusted. To compute this normalization, take for example, K = 1 and we have, by using s < t a , 



0(z)&(a:)d7 = s / 6 (t a x)Bu(x)dj 



< 



9' (t a x)u'{x)d-f 



<[ 


u'{x) 


J\x\>t- a / 2 





6rf. 



The last term is bounded by 0(t M ) for any M > due to that u(x)j has a subexponential decay 
and using the assumption (|6.1[) on V. 

We have proved that there is a constant a = 1 + 0(t M ), for any M > positive, such that ctht 
is a probability density. Clearly, 



a t := J xcthdj = 0(t M ) 



(77 



(x-a t ) 2 c t h t dj = p- 1 + 0{t M ), 
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and the same formulas hold if ht is replaced by vt since the OU flow preserves expectation and 
variance. Let gt be defined by 

gAx)e-^/ 2 = c t a^h t ({x + a t )a- l )e-^ + ^ 2 ^ 2 . 

Then gt is a probability density w.r.t. 7 with zero mean and variance /3 _1 . It is easy to check that 
the total variation norm of ht — gt is smaller than any power of t. Using again the contraction 
property of e tB and (|6.5p . we get 

J \e tB g t -u\dj < Ct K (6.6) 

for sufficiently small t. 

Now we check the LSI constant for gt. Recall that gt was obtained from ht by translation and 
dilation. By definition of the LSI constant, the translation does not change it. The dilation changes 
the constant, but since our dilation constant is nearly one, the change of LSI constant is also nearly 
one. So we only have to compare the LSI constants between dv = ud'y and cthtdj. From (|6.4p and 
that ct is nearly one, the LSI constant changes by a factor less than 2. This proves the claim on 
the LSI constant. 

Finally, the (j6.3j) directly follows from 

J \e tB G t - F\ d 7 ® n < n J \e tB g t - u\ d 7 
and this completes the proof of Proposition 16.11 rj 



7 Proof of Theorem 12.11 

We start with the identity 

» E+b r- 

L dE ' L dai - dan ° K ' ' ( E ' + ^kr- E ' + wm) (7 - 1} 

=C N . n [ + dE' I V d{N(x ll -E'),N(x h -x i2 ) : ...N(x in _ 1 -x ln ))f T d^ 

JE-b J ■ , .~T /■ 

U^2^— Tin 

where 0(ui,u 2 ,...n n ) := 0{q{E)u x , q{E){u 2 -u x ), . . .) and Cjv,„ = N n (N-n)\/N\ = l+O^N- 1 ). 

(n) 

By permutational symmetry of p ^ we can assume that O is symmetric and we can restrict the 
last summation to i\ < 12 < • • • < i n upon an overall factor n\. Let S n denote the set of increasing 
positive integers, m = (7722,7713, . . . ,m n ) G N" _1 , 7712 < 777-3 < • • • < 777 n . For a given m £ S n , we 
change indices to i = i±, ii = % + 7772, 73 = i + 7773, . . . , and rewrite the sum on the r.h.s. of (I7.ip as 

N N 

^ y^O(N(xj - E'),N(xj - x i+m2 ),N(x i+m2 - x i+m3 ), ■ ■ ■ ) = ^ ^ Y ijta {Et , x), 

meS„ i=l meSn i=l 
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where we introduced 

Yi,m(E', x) = 0(N(xi - E'),N(xi - x i+m2 ), . . .,N(xi - x i+mn )). 
We will set Yi m = if % + m n > N. Our goal is to estimate the difference 



G :-- 



pE+b jrt p N 
m6i„ i=l 



(7.2) 



Let M be an A^-dependent parameter chosen at the end of the proof, in fact, M will be chosen as 
M = N c with some small positive exponent c > 0, depending on n. Let 

S n {M) := {meS n ,m n < M}, S c n (M) := S n \ S n (M), 
and note that |5 n (M)| < M n_1 . We have the simple bound 6 < 0^V) + mV) + ®m (°°) where 

d£' 

meS„(M) i=l 



eff (r) := 



/■£+& j F j p N 



(7.3) 



and 



meS£(M) 



(7.4) 



("2") 

Note that 0^ (oo) is the same as 0^ (r) but with f T replaced by the constant 1, i.e., /ood/i is the 
equilibrium. 

Step 1: Small m case 

After performing the dE' integration, we will eventually apply Theorem 14.11 to the function 

G(u 1 ,u 2 ,...) := / d(y,u 1 ,u 2 ,...,)dy, 



i.e., to the quantity 



for each fixed i and m. 



/ dE' Yi. m (E', X ) = ^G(N( Xi - x i+m2 ), . . . 



(7.5) 



For any E and < £ < b define sets of integers J = JE,b,£ an d = J E b f by 

J:={i : 7l G [E-b,E + b]}, J± := {i : 7i € [£ - (6 ± + 6 ± £]}, 
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where 7$ was defined in f|2. 12j) . Clearly J C J C J + . With these notations, we have 

rE+b AT?I N rE+b j -pi 

j E _ b -W E = /„ 1ST E >W*. x) + lH.(x). (7.6) 

The error term ^j m , defined by (|7.6|) indirectly, comes from those i J + indices, for which 
Xi e [£- 6 + 0(iV Jl ), J B + 6 + 0(iV~ 1 )] since Y; im (£',x) = unless l^-^'l < C/iV, the constant 
depending on the support of O. Thus 

|fijm( x )l ^ Cb- X N- X #{ i:\xi- 7i | > (7.7) 

for any sufficiently large iV, assuming £ ^> 1/iV and using that O is a bounded function. The 
additional iV _1 factor comes from the dE' integration. Taking the expectation with respect to the 
measure f T dfi, we get 

/ l^ m (x)|/ T d^ < Cb' l C 2 N- 1 f £(x< - ^frdfi = Cb-^N- 1 -* (7.8) 

i 

using Assumption III f)2 . 13|) . We can also estimate 

rE+b J jpt 



pE+b j p/ 

^ — V Ei, m (i? / ,x)+C6- 1 iV- 1 |J + \^l 

= f ^Y, Y ^ E '> :K )+ Cb ~ lN ~ 1 \ J+ \ J ~\+ E irn(x) ( 7 - 9 ) 

JR 26 i6 J- 

^ / ^^Y i , m (E , )X )+cb- i N- i \j + \j-\+cb- i N- i \j\j-\+s^^ 
JR 26 l6 J 

where the error term Hj m , defined by (|7.9p . comes from indices i G J" such that Xj [-E 1 — 6, -E 1 + 
6] + 0(1/N). It satisfies the same bound (|7.8p as f^j m - 

By the continuity of £, the density of ji's is bounded by CN, thus |J + \ J~\ < CN£ and 
I J \ J~\ < CN£. Therefore, summing up the formula (|7.5p for i £ J, we obtain from (|7.6p and (|7.9p 

rE+b JP/ /■ ^ 

/ £ ,_ 6 V J jy^{E'^)f T ^ (7.10) 
< (26)" 1 y !^G(iV( a;i -Xi +m2 ),...)/ T d M + C6- 1 e + C6- 1 r 2 iV- 1 - 2£ (7.11) 
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for each m G S n . A similar lower bound can be proved analogously and we obtain 

pE+b jg( p n r \ / \ 

/ ~9h~ / Y, Y ^ E '^f^-( 2b y 1 / JrJ2 G { N ^- X i+^---)f^ 

jE ~ b J i=i J ieJ 

< Cb-^ + Cb- 1 ^- 2 ^- 1 - 26 

for each m G S n . 

Adding up (fTTZj) for all m G S n (M), we get 



(7.12) 



r E+b dE' r A . i . 



meS„(M) i=l 



meS n (A/) ieJ 



< Cir 1 ^ + Cflr^JV- 1-26 , (7.13) 



and the same estimate holds for the equilibrium, i.e., if we set r = oo in fjT. 13|) . We now subtract 
the these two formulas and apply (|4.3|) from Theorem 14. II to each summand on the second term in 
(I7TT3D . Choosing £ = AT-( 1 + 2£ )/ 3 to minimize the two error terms involving £, we conclude that 



6 



(i) 
M 



pE+b :p/ p N 

J E b ^bJ E E y ^'' x )(^- d ^) 



meS„(M) i=l 

l + 2e 



where we have used r = N~ 2e+S and that |J| < CNb. 
Step 2. Large m case. 

For a fixed y G M, 4 > 0, let 



(7.14) 



JV £ £ 

X (y,i) :=^l{x,G [y--,y+-]} 

i=l 



denote the number of points in the interval [y — £/N,y + £/N\. Note that for a fixed m 
(?tt,2, . . . , m n ), we have 



v 



^|Yi, m (£',x)| <C-x(E',£)-l(x(E',£)>m n ) <C m • l(x(E',£) > m), (7.15) 



where £ denotes the maximum of \u± \ + . . . + |u n | in the support of 0(u±, . . . , u n ). 
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Since the summation over all increasing sequences m = (777-2, • • • , Tn n ) £ N™ 1 with a fixed m n 
contains at most m™ -2 terms, by definition (|7.4p we have 



rE+b jp( 00 

eg } (r) < C / — £ 777"- 1 / > m)/ T d M . (7.16) 



m=M 



Now we use Assumption IV for the interval I = [E 1 — N E' + N 1+<T ] with <r chosen in such 
a way that N a < M 2 . Clearly Mi > x( E ',£) for sufficiently large N, thus we get from (|'2.14p that 

00 „ 00 

m=M m=M 



holds for any a 6 N. By the choice of <r, we get that ^/rn > iV <T for any m > M, and thus choosing 
a = k(n +1), we get 

Together with (|7.14p . we have thus proved that 



9 < CM'^Ub^N- 1 ^ +b- l / 2 N £ '- & / 2 ) + -% T . (7.17) 



Choosing M such that M n = N 6 ' and then choose large enough so that the last term £ q l 1 is 



M fc - r 

+- f no loof form _ 



smaller than, say, iV 2 . We have thus proved that 

9 < CN^'ib^N- 1 ^ + b-^N-V 2 ] (7.18) 

for r = N~ 2e+S and this concludes (|2.15p . 

For the proof of (j2. 17|) . we choose £ > 2N~ 1+A , and then by using ()2. 16[) we can estimate Slj m 
directly as 

J \n+ m (x)\f T d»<N- K (7.19) 

for any K > 0, instead of ()7.8p . Therefore, the estimate on the right hand side of ()7. 12[) and the 
subsequent estimates can be replaced by 

Cb~ l i + Cb~ l N- K (7.20) 

provided £ > 2N~ 1+A . Choosing £ = 2N~ 1+A and following the same proof, we can improve the 
estimate fTTH]) to 

6 < CN 2e ' [b^N- 1 ^ + b-V 2 N- 5 / 2 ] (7.21) 
for r = iV- 2e+5 . This proves (I2.17P and we have completed the proof of Theorem 12.11 

□ 
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8 Local Marchenko-Pastur law 



In this section we establish that the empirical density of eigenvalues for sample covariance matrices 
is close to the Marchenko-Pastur law even on short scale. We do this by controlling the difference 
of the Stieltjes transform, establishing results analogous to Theorem 4.1. and Proposition 4.2 of 
|16j . In this section, we focus on < d = N/M < 1, in particular the lower spectral edge A_ > 0. 
The constants appearing in this subsection may depend on d. 

Before the detailed proof, we explain the main steps of the argument which is similar to the 
method we have successively developed in [17\ [T8| 119] , The proof given here is somewhat com- 
plicated by fact that the matrix elements themselves are not independent but are generated as a 
quadratic expression of independent random variables. The first step, Lemma 18.11 is an apriori 
bound on the local density on short scales, n 3> 1/N, using resolvent expansion and a large de- 
viation principle for quadratic forms. Expressing the resolvent of H in terms the resolvents of its 
minors, we obtain a self-consistent equation (|8.20p for the Stieltjes transform of the eigenval- 
ues. This equation is very close to the defining quadratic equation of the Stieltjes transform mw of 
the Marchenko-Pastur law, see (18.7|) . with a perturbation term Y(z). This term can be estimated 
by large deviation arguments and using the a-priori bound on the local density. Then in Lemma 
18.31 we investigate the stability of the self-consistent equation for mw- Although the perturbed 
equation has two solutions, only one of them can be close to mjy. To select the correct solution, 
we use a continuity argument in the spectral parameter z. For z = zq with a large imaginary 
part, say zq = 10 + 5i, the explicit formula (|8.27|) for the solution can be directly analyzed. For z 
approaching to the real axis, we prove that the two unperturbed solutions remain far away from 
each other (18.35[) . Since the perturbed solutions are also continuous in the spectral parameter, 
for a sufficiently small perturbation they must remain in the vicinity of the correct solution of the 
unperturbed equation. 

This analysis yields a bound on the difference of Stieltjes transforms, mjy — mw- In Lemma [8.51 
we give a better bound on Erra^r — mw- The improvement is due to the fact that the perturbation 
term Y in the self-consistent equation is random and its expectation is much smaller than its 
typical size (compare (|8.17p and (|8.48p ). Finally, in Lemma 18.61 we give an independent estimate 
on Eton — mw that is weaker in terms of rj = Imz but it is weaker in k. When we will verify 
Assumption III in the following Section [91 we will use both bounds simultaneously. 

Lemma 8.1 Let < E < 10 and < d < 1. Consider the interval L n = [E — rj,E + rj\. Let A/} 
denote the number of eigenvalues of H = A* A in the interval /„. Suppose that N~ 1+e < rj < E/2, 
for some e > 0. Then there exist constants C, c > such that 





for all N,K large enough (independent of E). 
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We remark that the assumption on r\ can be relaxed to CN^ 1 < 77 < E/2. But we do not need 
this result here. For details, one can refer to Theorem 5.1 of 1191. 



Proof of Lemma \8.1[ We observe, first of all, that 



N 



— — < — ImTr - 

Nri ~ N H-E-iri N ^ H - z 



where we defined z = E + iij. It follows that 

p (Mi > knt]/Ve) < m 



Im 



H - z 



> k/Ve^J 



(8.2) 



Denoting by a\ the first column of A and by B the M x (N — 1) matrix consisting of the last 
N — 1 column of A, we have 

a\ ■ ai (B*ai)' 
B* ai B*B 



H 



Hence 

Using the identity 
we find 



1 



XI) 



H - z y ~'~ J ax-ai- z-ai- B(B*B - z)~ l B*ai ' 
B(B*B - z)~ 1 B* = BB*(BB* - z)' 1 , 



1 



■(1,1) 



1 



H-z" 1 ' ai-a 1 -z-a 1 -BB*(BB*-zy 1 a 1 ' 

Denote /i^'s (a = 1, . . . , N — 1) the eigenvalues of the (N — 1) x (N — 1) matrix B*B. The /i Q 's are 
also the eigenvalues of M x M matrix BB* and the other eigenvalues of BB* are zeros. Then define 
v a (a = 1, . . . , N — 1) as the normalized eigenvectors of BB* associated with non-zero eigenvalues 
[i a , i-e., the matrix elements of BB* are given by 



N-l 



{BB*)ij = ^ VotV a {i)v a {j). 



(8.4) 



a=l 



Inserting (j8.4|) into (j8.3[) . we find 

1 



H-z 



(1,1) 



°1 "1 Z M 2^a=l fjLa-Z 



where we defined the quantity £ a = M\a\ ■ v a \ 2 (note that E£ a = 1). Taking the imaginary part, 
we find 



Im 



1 



H-z 



(1,1) 



< 



1 



< 



CNrj 



E Yl a:\)j, a - E\<n^ a 
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where we used the assumption that n < E/2. Because the eigenvalues ^ Q 's are the eigenvalues 
of the (N — 1) x (N — 1) matrix B*B, they are interlaced with the eigenvalues of H and \{a : 
\fi a ~ E\< r)/2}\ >Mi-l. It follows from flO]) that 

f(m i >^P-)<Nf( y £ Q < ^ and M: > *p\ < CNe'*^^ , 

where in the last step we used Lemma 4.7 from [TU]. The claim follows by the assumption that 
Nn > N £ and that ./V and K are large enough. q 

Proposition 8.1 Consider sample covariance matrices H = A* A with A an M x N matrix with 
independent and identically distributed complex entries. Let < d < 1. Recall A_ and A+ in (|3.13p 
and define k, as 

k = k(E) :=\{E — X~)(E — A + )| . (8.5) 

We will often drop the argument E from the notation of k for brevity. Then for any E, r/ satisfying 
N~ 1+£ < 7] < \E, iA_ < E < 10, the Stieltjes transform, 

m N {z) := -^Tr — ^ - , z = E + in, 
of the empirical eigenvalue distribution of H = A* A satisfies 

m N (E + irj) - m w {E + irj)\> . 6 ) < Ce^ 5 ^ , (8.6) 

Vk + & ) 

for any 5 small enough (independent of E and n) and N > 2. Here mw(z) is the unique solution 
of 

m w (z) + — — — = 0, (8.7) 

z — (1 — d) + zdmwyz) 

with positive imaginary part for all z with Im z > 0. 

Recall A± = (1 ± d 1 / 2 ) 2 from (|515|) . The function m w defined in (I8.7P depends on d and can be 
written as 

l-d- z + iy/(z - A_)(A+ - z) 
m w {z) = — , (8.8) 

where denotes the square root on complex plane whose branch cut is the negative real line. 
Explicit calculation shows mw{z) is the Stieltjes transform of the Marchenko-Pastur density given 
in (pU4]) . 

Using ()8.6p and ()3.14j) . we have the local Marchenko-Pastur law for the number of eigenvalues 
in a small interval: 
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Corollary 8.2 Consider an interval I = [E — rj, E + r]] C [A_, A+] within the bulk spectrum. Let 
5 be a sufficiently small parameter. Suppose that E and n are chosen such that 5~ 2 N~ 1+e < n < 
C~ l min{K, 5 1 / 2 k 3//4 } with a large constant C and with k = k(E) given in (|8.5p . Then we have the 
convergence of the counting function, i.e., 

where N^E) = \{\ a ■ \X a — E\ < n}\ denotes the number of eigenvalues of H = A* A in the interval 
I =[E-ri,E + rj\. 

Proof of Corollary \8.SX The proof of (|8.9p follows from the inequality (|8.6p with a similar 
argument as the proof of Proposition 4.1 of [16]. rj 

We remark that, similarly to Theorem 3.1 in [19], the assumption on the lower bound rj > N~ 1+£ 
can be relaxed to rj > KN~ l and obtain the local Marchenko-Pastur law on the shortest possible 
scale, at least away from the spectral edges. 

Proof of Proposition Iff. 11 Let aj be the j-th. column of A and let B^ be the remaining M x 
(N — 1) matrix obtained from A after removing the j-th column aj. Let fia\ Va be the non-zero 
eigenvalues and the eigenvectors of the matrix B^[B^]* and we define £ { J ] = M\ aj ■ v { J ] \ 2 . Then 
we have the formula 

111*1 1 N 1 

m N {z) = — Tr — = — V — = — V 



that we rewrite as 



/'o 



1 ^ 

m N {z) 



j=l a J a J M M 2^a=l ~U) ~ AU; 

with 

i ^v- 1 (i) 

= (8.10) 

Q =l Ma z 
( 7 ) f 7 ) 

Note that the vector aj is independent of /Jq and i>a ■ Therefore, we have 

EX {j) = 0, = 1. (8.11) 

Define m£Li(*) = ^^([B^]*^^ - z)" 1 , then 



JV-l 
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Hence 

A' 



with 

1 z 



1 

jz — d — z drriN(z) + Y"Cj) 



= yO') (z) = (a, . a . _ i) + _ _ A ( (A r _ 1)^(2) - iVm^(z)) - X«( Z ). (8.13) 



For fixed j, denote 6 = \[Ma,j with 6 = (&i, . . . , 6m)- Drop the superscript j for /i Q = /x« , v a = v& , 
B = and m^-i = m ^-i f° r simplicity. We rewrite as 



M 



X® = ]T atk[bib k -W e b k ] 



l,k=i 

with 



N-l 



1 fi a v a (£)v a (k) 



■=-Y 



So with ±A_ < £ = i?e(z) < 10 and iV- 1+£ < rj < E/2, we have 

Ei |2 1 ^ 



< 



CM 



M 2 ^ \n Q - z\ 2 ~ Mr] 



with some fixed large K (using dyadic decomposition and (|8.ip . similarly to the argument in Lemma 
4.2 of [TO]) apart from an event of probability e~ Cy /^i , Then with Proposition 4.5 of [19] . we have 

P(max|X^| > 5) < Ce"^, (8.14) 

3 

for sufficiently small 5 > 0. Since the eigenvalues of B*B are interlaced with the eigenvalues of 
H = A* A, we have 

\(N - l)rojv-i(z) - NmjvOz)! < Crf (8.15) 

Then using Eoj • ay = jfPX^i \bi\ 2 = 1 an d Proposition 4.5 of [19] for the iid variables frj's, we 
obtain 

P (ley • Oj - 1| > KM~ 1/2 ^j < Ce ~ cmin{K ' K2} . (8.16) 
Combining <KW\ . (18TT51) . and (18351) . we find that 



3G 



for sufficiently small 5 > 0. With the assumption rj < Re(z)/2 = E/2 and Nr/ > N £ , we obtain 

P (max > 5J < Ce~ c5 ^ . (8.17) 

On the other hand, with the definition of for anyj, z, z' such that, \z\, \z'\ < 10, Im(z),Im(z') > 
77, we have 



(W^Hz) - Y®(z?) > tT 2 \z - z'f) < e 



(8.18) 



Together with (pgr?]) , we obtain, for iV^ 1+e <r]<\E, < E < 10 and sufficiently small 5, 



max max 

z'eL(z,P w ) j 



y (j V)| > <H < Ce- C<5 ^, P10 = 10 + 5i, 



(8.19) 



where L(z,Pio) is the line segment connecting points z and Pio- Then the Proposition 18. II follows 



from the next lemma. 



□ 



Lemma 8.3 Assume H is a NxN positive semidefinite matrix with \\H\\ < 5. For fixed < d < 1, 
we recall the notation X± = (1 ± y^) 2 - zq = E + ir] and N~ 1+£ < r/ < \E, \\- < E < 10. 
Denote L{zq,P\q) the line segment connecting zq and P\q = 10 + hi. Suppose that for any z S 
L(zq, P\o), the Stieltjes transform m^iz) = jfTr(H — z) -1 satisfies the following self- consistent 
relation: 

N 

1.20) 



m N 



1 N 
K ' N ^ 1 



z-d- zdm N (z) + Yd)(z) ' 
for some Y^\z) 's. Then there exists Sq > depending only on d, such that, whenever 



5 = max max 

zeL(z ,p 10 ) j 



<5n 



we have 



5.21] 



1.22) 



\mN(zo) —mw(zo)\ < C8(k + S) 1 ' 2 
with k = k{E) := |(A+ - E)(E - A_)|. 

Proof of Lemma \8.3l We begin with a special case: zq = P\q. In this case if z £ L(zq, Piq) then 
z = zq = Pio- With the assumptions on H and < d < 1, it is easy to see that: 



-mN{z)d H 1 



1 



which implies 



|1 — z — d — z dm]y(z)\ > — \z\ 



(8.23) 
(8.24) 
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Insert it into (|8.20p . we obtain when z = Pro, 



m N (z) + 



z — (1 — d) + z dmis[(z) 



< CS. 



Denote the solutions of 



S + 



z-{l-d) + zdS 



A 



(8.25) 
(8.26) 



by S±(z). Explicit calculation shows 



s£(z) 



l — d — z±i(l + dA)J (A£ - z)(z — A A ) A 



2dz 



+ 



where 



' y/l + A(d-d?)±y/d s 



With the notations: 



1 + Ad 



S±(z) = Sl(z), 



1.27) 



.28) 



(8.29) 



we note m\v( z ) — S+( z ) ( see f)8.7[) ) . Then the following lemma implies, with Im(mAr(z)) > 0, 
ImS+^io) > 0, ImS-(P 10 ) < 0, that ([£22]) holds for z = P 10 if 5 is small enough. 

Lemma 8.4 Let S±(z) be the solutions of (|8.26p . Let z = E + in and |A_ < E < 10. For 
sufficiently small A, depending on d, 



max 



{\Si(z) - S+(z)\, \S±{z)-S-(z)\} 



< C- 



A 



V^E) + A 

Proof of Lemma \8.4\ First, when A is small enough, an easy calculation shows 

A = max{|A£ - A±|} < C|A|. 

Therefore, we have 



3.30) 



.31) 



max{\S$(z)-S+(z)\, \S*(z) -S-(z)\} < C\A\ + C ^(A$ - z)(z - \*) - V / (A+ - z)(z - A_) 

(8.32) 

Let a = (z - A_)(A+ - z) and b = (z - A A )(A$ - z) - (z - A_)(A + - z). Note that \b\ < CA and 
therefore, by (IOTP . \b\ < CA. Hence, ([8301 follows from |(A + - z)(z - A_)| > C7e(E) and from 
the inequality 



va + 6 — \/a 



< C- 



(8.33) 
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which holds for any complex number a and b. rj 

Now we prove (|8,22p for the case z$ 7^ Pio- We first note that the two solutions of (|8.20p are 
S±(z) when Y® = 0. One can check that for z G L(zq,Piq), these two solutions are bounded by 
some constant C±: 

\S±(z)\ < Ci, i.e., \z - {1 - d) + zdS±(z)\ > Cf \ (8.34) 

and 

|SL(js) - > Cy^Re^) +Imz. (8.35) 

Furthermore, \S-(zo) — S + (zq)\ can be bounded by \S—(z) — S+(z)\ for any z G L(zo,Pio) as follows, 

|S_(* ) - S+(*>)| < C min |S_(z) - 5+0?)|. (8.36) 

z£L{z ,P 10 ) 

On the other hand, for any z G L(zq,Piq), we claim that if rriN^z) is close to S'-(.z) or S + (z), 
then it should be really close to S-(z) or S+(z), i.e., if 

min{|mjv(z) - SL(z)|, |mjv(z) - S+(z)|} < Cf 1 /20, (8.37) 

then 

min{|mtf(*)-S-(*)l, |mjv(z) - S+(z)|} < C r —= =. (8.38) 

y K{Re z) + 

To see this, note that (I8.37D together with (|8,34p imply 

\z- (1 -d) + zdm N (z)\ > -Cf 1 . (8.39) 



Then with (|8.20p and (|8.2ip . we obtain that (|8.25j) holds for any z G L(zo,Pio)- Using Lemma [8? 
again, we have (I8.38|) . 

We have seen that (|8.22p and (|8.37p (for small S) hold when z = P\q. Because mjv(z), S±(z) 
are continuous functions of z, with (|8.38j) and ()8.37p . we can see that when 5 is small enough, (|8,38p 
holds for every z G L{zq,P\q). This result shows that ijin{z) must be close to at least one of S + (z) 
and S-(z) and it is close to S+(z) when z = Pro- 

Now we claim that if mN(zo) were close to S-(zq), i.e., 

\m N (z ) - S.(z )\ < C 5 (8.40) 

y/K(E) + 5 

then mTv(-zo) is also close to S + (zq), which implies that ijin(zo) is always close to S+(zo). 

Again, with the continuity of mj^(z) and S±(z) and (|8.38|) . if mTv(zo) is close to S-(zq) in the 
sense of (|8.40p . then there exists z G L(zq,Pio) such that miy(z) is close to both of S-(z) and 
S+(z), i.e., 

\S + (z) - S.(z)\ < 2C J (8.41) 
A/K(Re z) + 
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which implies 

\S+(z) - S-(z)\ < 2C * (8.42) 
^JCnyhj) + o 

Combining ()8.42j) and (|8.36p . we obtain 

\S+(z ) - S-(zq)\ < C 6 (8.43) 
y/K(E) + 5 

Together with (j8.40j) . we obtain rriN(zo) is still close to the S+(zq). It means that (|8.22j) holds for 
all Zo's m our assumption, using the fact S+(zq) = myy^Zo)- This completes the proof of Lemma 

EH □ 

The following lemma shows that the expectation value of m^{z) is close to my/{z). 

Lemma 8.5 Let z = E + irj, such that N~ 1+e < rj < \E, \\^ < E < 10, for some e > 0. Then 
we have 

\Mrn N (z)-m w (z)\<j^^, k = |(A+ - E)(E - A_)|, (8.44) 

for large enough N depending on e. 

Proof of Lemma \8.5[ Using (|8.34p and the estimate (|8.6p from Proposition 18.11 we have 

\Em N (z)\ < C, \m w (z)\ < C (8.45) 

uniformly in z = E + irf within the range N~ 1+£ < r\ < ^E, \\~ < E < 10. 

We can assume that Ntjk 3 ^ 2 is much greater than 1, otherwise (|8.44j) is trivial. Combining 
NrjK?/ 2 > 1 and Nn > N £ , we obtain Nvk > N £ / 3 . Using (|8.12p . we write Emjy(2;) as 

1 ( N 1 \ 

Em N (z) = --E |£ B _ zd{mNiz) _ KmN{z))+Y U) {z) J • ( 8 - 46 ) 

where i? = z — (1 — d) — z dErriN{z). Then, with (18.6p . we know 

E\m N - Em N \ 2 < O ( — — ). (8.47) 
\NrjK J 

From (jSIinD , (|8TTjl . (I8TT3]) . (|8TT5]l and ([HTTP , we obtain 

EY^(z) =e(±-^({N- l^Uz) - Nm N (z))^ = O (JL 



3.48) 
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and 



E 



(8.49) 



Using (]8.6p . (j8.34p and Nijk > iV £//3 , we obtain that \B\ is bounded from below by a constant Cq. 
Furthermore, for some 5 > 0, 



P (|£ - zd(m7v(^) - E(m N (z))) + Y (j) (z)| < C /2) < Ce 

Denote a = —zd(niN(z) — Etoa^z)) + Y^\z), then 

1 \ 1 



cN" 



E' 



13 



E'(a)S~ 2 + 0(E / a 2 )| J B 



(8.50) 



5.51) 



where E' is the conditional expectation under the condition: \B — zd(rriN(z) — Emjv(z)) + Y^\z)\ > 
Co/2. Because rriN(z) and Y^\z) are bounded from above by a polynomial of M, inserting (|8.50p 
into ()8.5ip . we obtain 



Emjv(z) + 



1 



z — (1 — d) + zdEm^{z) 
Combining this with (j8TTj) . (j8Hgj) and (pT49j) . we obtain: 



< C\E(a)\ + CEa 2 . 



ErriN(z) + 



z — (1 — d) + zdErriN(z) 



< 



C 
Nt)k ' 



Using Lemma [831 we have 



mini \Em N (z) - S + (z)\, \Em N {z) - S_{z)\ \ < 



C 



NriK 3 / 2 ' 



5.52) 



(8.53) 



5.54) 



Using this inequality, and S+(z) = mw(z), we can easily obtain (18.441) for z = Piq. Consider now 
z = E + if] 7^ Pio- If Emjv(z) is closer to S~(z) than C(NrjK 3 ^ 2 )~ 1 , then by the continuity of 
Emjv(z), there exists z' G L(z,P\q), such that, Emjv(z') is close to both of S+(z') and S-(z'), i.e., 



\S + (z')-S-(z')\< 



c 



< 



c 



Nlmz'[K{Rez')f/ 2 ~ NrjK 3 / 2 



5.55) 



Together with (|8.36p . we obtain that Em^(z) is also close to S+(z) = m\v{z) and complete the 
proof. □ 



Now we give an alternative bound on Emjv(z). 
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Lemma 8.6 Let z = E + rji, N~ 1+£ < r] < E/2, A_/2 < E < 10 and e > 0. Suppose Nkt] > N 6 ' 
for some e' > 0, we have 

C 

\Em N (z) - m w (z)\ < N ^ /2 ^ l/2 , (8.56) 
when N is sufficiently large (depending on e' ). 

Proof of Lemma Iff, 61 We only prove the case of the real sample covariance matrix. The case of 
the complex sample covariance matrix can be treated similarly. 
First, we show 

E\m N (z) - Em N (z)\ < (8.57) 

Let A Q and u a be the eigenvalues and eigenvectors of H = A* A. The derivative of A Q with respect 
to the (i,j)-th matrix element Aij is given by 

^- = 2(Au a )(i)u a (j). (8.58) 

Using Y2j u a {j)up(j) = and ^2 i (Au a )(i)(Aup)(i) = A a 5 Qj/ 3, one can obtain the following result, 
as in (3.3) of [T7], 

E\m N (z) - Em N (z)\ 2 < ^E ^ ^ z \^ j ' ( 8 ' 59 ) 

Then with Lemma |8. 11 as in (3.6) of [T7], we obtain f|8.5Tj) . 

Let B = z — (1 — d) + z dEmjv(z), a\ = zd(mN(z) — Em7v(z)) and 02 = Yj for each j. Using 
the assumption Nt]k > N £ for some e' , we obtain that \B\ is bounded from below by a constant 
Co and for some 5 > 0, 

P(|J3-ai -02I < Co/2) < e~ NS . (8.60) 

We have 

E'- = 1 + 0(5- 2 (E'|ai|)) + 4e'(o 2 ) + 0(iT 3 E'(a|)), (8.61) 

B — a\ — a2 B B z 

where E' is the conditional expectation under the condition: | B — zd(m n(z)— Em n(z)) + Y^\z)\ > 
C /2. Combining this with §8W\) . jS37]), ([535]) and (^9]) . we obtain 



EmAr(^) + 



z- (1-d) + zdEm N (z) 



As in the proof of Lemma 18.51 with a continuity argument and Lemma 18.41 we can obtain (|8.56|) 
from (|8.62p and complete the proof. □ 
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9 Verifying Assumption III 



The following theorem gives the estimate f)2. 13j) for the singular values of the sample covariance 
matrices. 



Theorem 9.1 Assume that the single site distribution du of the entries of \JMA satisfies the 
logarithmic Sobolev inequality (|3.10j) . Recall that p\y in (|3.14p denotes the density in Marchenko- 
Pastur law. Define jj E [A_, A+], (J = 1,2, ... , N) with the relation 



H j 

p w (x)dx = — . (9.1) 

-oo 



1 II 

Denote Xj = A • the singular values of A. Then there exists 5 > 0, such that, 



1 N 
N ^ 

3=1 



N 

1/2 2 



< CN~ l ~ s . (9.2) 



Remark. An analogous result holds for the eigenvalues of the Wigner matrices; the proof is 
similar and we will not give the details here. We only point out that the key ingredients of the 
argument below are: (i) apriori bound on the extreme eigenvalues (see Lemma 19.21 and the remark 
afterwards); (ii) concentration of the local density of states (used in Lemma [973]) . We also critically 
use the fact that the density of states (semicircle law or Marchenko-Pastur law) has a square root 
singularity at the edges. 

The local semicircle law needed in the analogue of Lemma 19.31 for Wigner matrices has been 
proven for hermitian matrices (see [16] and references therein) but the proof is valid for symmetric 
and quaternion self-dual matrices as well. The extension to symmetric matrices is trivial. For the 
case of quaternion self-dual matrices, the only additional observation is that the non-commutativity 
of the quaternions is irrelevant in the arguments because the common starting point of our papers 
[16] \T7\ [T8] [T9"l [20] is an identity on the diagonal elements of the Green's function that involves 
only complex numbers. For simplicity, we present it only for the (1,1) diagonal element G z (l,l) of 
(H — z)~ l , where H is an N x N quaternion self-dual matrix and zeC: 



1 



a+ ■ (B - z)- 



N-l 



G*(l,l)= u . _+ , p h-z-- > t^- , (9.3) 



N ^ X a 



a=l 



in particular, G z (l, 1) is a diagonal quaternion thus it can be identified with a complex number via 
the identification (|574"]) . Here h G R, a G H^ -1 and B is an [N — 1) x (N — 1) quaternion self-dual 
matrix obtained from the following decomposition of H 

H = (* %) . (9.4) 
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The real numbers Ai < A2 < • • • < Xn~i denote the eigenvalues of B and the nonnegative real 
numbers £ Q are given by 

£ a = N(a + ■ u a )(u+ • a) = N\a + ■ u a \ 2 

where u a G H^ -1 is the normalized eigenvector of B associated with the eigenvalue X a . The dot 
product of two quaternionic vectors, a, b 6 H^ -1 is defined as 

JV-l 

a + • b := ^ a+6 n . 

n=l 

The proof of (|9.3p is a straightforward computation. This identity is the key to extend our results 
on the local semicircle law for quaternion self-dual matrices without any further modifications. 

Now we return to the the proof of Theorem 19.11 and we start with some preparatory lemmas. 
First, we recall the following result from |22j. 



n x (E) = ^E[#{Xj <E}}, (9.5) 



Lemma 9.2 (Corollary V.2.1 of \22^) Define 

1 

N' 

then ra A (A_ - iV" 1 / 5 ) < Ce~ NE and 1 - n A (A+ + N' 1 / 5 ) < Ce~ N£ for some e > 0. Therefore for 
any 1 < j < N , 

A_ -CN^I b <EAj < X+ + CN- 1 / 5 . (9.6) 

Remark. In fact, the error term in [22] is N~ 2 / 3+£ instead of N' 1 ^ but we will use only the 
weaker bound (|9.6p in order to indicate that our proof goes through for Wigner matrices with not 
necessarily symmetric distributions as well. In the latter case only N~ l / 4+£ has been proven by Vu 
in [42] for compactly supported distribution u with an effective dependence of the constant C on 
the support. This effective dependence is necessary to remove the compact support condition as 
in Lemma 4.1 of |20| . Strictly speaking, the result in [32] was stated only for symmetric Wigner 
matrices but it holds for hermitian and quaternion self-dual Wigner matrices as well, since the key 
estimate (equation (5) in [32]) is independent of the matrix ensemble. 

Lemma 9.3 Recall that pw in- (|3.14p denotes the density in Marchenko-F 'astur law. Let 

rE 

n$v(E) = / p w (x)dx, (9.7) 
J — 00 

then 



roo 

/ n x (E) - n^(E) dE < CN^ 7 , (9.8) 
Jo 



and 



sup 

E 



n x (E) - n$y(E) < CN~ Z ' 7 . (9.9) 
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Proof of Lemma \9.3l To prove (|9,8p , with Lemma 19.21 one only needs to prove 

/•2A+ 

/ \n x (E) - n A (A_/2)| - n^{E) dE < CN~ 6 / 7 . (9.10) 

To this end, we first note that |Em7v(-z)| is bounded uniformly for z = E + ii], such that N~ 1+£ < 
T] < E/2 and < E < 10 (see (|8.45|) ). Moreover, by (|8.56j) and (|8.44p . the conditions of Lemma 
B.l in [16] are satisfied and thus we obtain (|9.10p . Following the proof of Lemma B.l in [16], we 
see that this lemma is still valid if (logiV) 4 in (B.2), (B.3) and (B.4) is replaced with N 6 for small 
enough e > 0. 

To prove (|9.9p . for fixed E, we can assume n x (E) > n x v (E) and denote A = n x (E) — n x v (E). 
Because n x (E) is an increasing function and the derivative of n x v (E) is bounded by ||/9w/||oo = 
d 2 )" 1 ' 2 , we have: 

n x (E') - n^(E') > A - C(£ ,/ - E) > 0, when E < E' < E + C _1 A. (9.11) 
Integrating both sides, we obtain 

A |n A (£;')-^(^)|d^ >0(A 2 ). (9.12) 

Je 

Using ([HH]), it follows that A < 0(iV^ 3/7 ). n 

Similarly to the calculation in Theorem 3.1 of [T7], by using the logarithmic Sobolev inequality, 
we have 

Lemma 9.4 For j,K G N, such that, j + K < N + 1, Zei = I*' -1 Silo 1 ^ en / or an 2/ 

5 > 

P (|A^ - E(Xj, K )\ > N-^K' 1 ' 2 ) < Ce~ NS/2 , (9.13) 

with C depending on 5. 

□ 

We will say that an event A, depending on N, occurs with an extremely high probability if 
> 1 — JV for any C and sufficiently large N. 



Lemma 9.5 Recall that aj is defined as E(Aj). Suppose that there exist sufficiently small positive 
numbers e\, Si and £3, such that, 

A_ + N~ 2£2 < Xj_ < A_ + N- £2 , A + - N~ 2£2 > X j+ > X+ - N~ 62 (9.14) 

and that 

\Xj - a,j\ < for j_ <j< j + , (9.15) 
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hold with an extremely high probability, where we introduced the notations j- = N £l and j+ = 
N — N 1 " 61 . Then for some e > 0, we have 

^El«i-^i^ rl " ( 9 - 16 ) 

i 

Proof. By symmetry, we only need to prove that (|9,16p holds for the sum on the indices j with 
7j < ctj. Introduce the notation 

n«(E):=±#[{ aj <E}]. (9.17) 

The estimate (|9.13p . with K = 1, implies that maxj \ \j — ay| < Af _1 / 2+<5 holds with an extremely 
high probability, for any positive 5. Therefore, we can bound n a {E) from above by (for any E) 

n x {E + N~ l ' 2+5 ) = — E[#{Aj < E + N' 1 ^ 5 }] > ^#[{aj < E}} - CN~ 100 = n a (E) - CiV" 100 . 

(9.18) 

Similarly, we can obtain the lower bound. Putting them together, we have that: 

ON' 100 + n x {E + JV-V2+«) > n a {E) > n x (E - N' 1 / 2 **) - CN~ 100 (9.19) 

holds for any E. The assumption (|9.14p implies that 

\j i [A_ + N~ £2 , A + - N~ £2 ] (9.20) 

holds with an extremely high probability, for any j < j- or j > j + . For the other j's, for which Aj 
may appear in [A_ + N~ £2 , A_|_ — N~ £2 ], we use (|9. 15 j) and obtain the following improved bound on 
n a (E): when A_ + N~ £2 < E < A+ - N~ £2 , 

CN~ 100 + n A (£ + JV -1 / 2-63 ) > n a (£) > n x (E - N' 1 ^ 3 ) - CN~ W0 . (9.21) 

Let F(E) be a continuous and differentiable function, such that A/" _1 / 2_£3 < F(E) < Af _1 / 2+5 , for 
< S < ^min{ei,e 2 ,e 3 }, F(E) = N~ l / 2 ~ £ ^ for A_ +2A^ £2 < E < X + -2N~ £2 , F(E) = j\r _1 / 2+<5 
for E < X^+N- £2 or E > A+ - N~ 62 and \F'(E)\ < N~ 5 . Combining (I9T2T]) and (l9"T^|) . we obtain 

CA^ 100 + n A (£ + F(E)) > n a (E) > n x (E - F(E)) - CN' 100 . (9.22) 

On the other hand, we have 

aj - Jj = jf 1 (n w {E) >j^> n a (E)^j dE, (9.23) 



46 



for any j, such that, ay > jj. Therefore we can write 
^ E l«i-^'! 2 (9-24) 



iV 



= I E / X, <s 1 - ^ > n ° (ii;) ) 1 ( nw(ii;/) - i > nQ(jB,) ) dEdE ' 

= | E / / £;<s 1 ^ £ > ^ + ^~ 10 °) 1 (nw(E') > jj > n a (E')^ dEdE', 

where in the second line we used the fact that the difference between j/N and n a {E) must be a 
multiple of N~ 1 . Since maxj Xj < 10 holds with an extremely high probability, using (|9.22p . we 
can replace n a (E) with n x (E - F(E)) in (pHl|) . i.e., 

1 K-7/-iV 10 (9.25) 

^ | E / j E , <E 1 ( ra ^) ^ jj > nX ( E ~ F ( E ») 1 (^ W {E>) > A > n Q (£')) d£d£'. 

Change the variable from E to = E - F{E). With < N~ s , we obtain F(i) = (1 + 

0{N~ s ))F{E) and dt/dE = (1 + 0(N~ S )). Thus 



1 J] l«,-7/-iV- 10 (9-26) 

^ | E / X, <s(t) 1 + 2F{t)) nX{t) ) 1 { nw{E ' ] - Jr > nQ(i?/) ) dMjB '' 

where -E(£) is the inverse function of £(-£?) . Note, when 1 (• • • ) 1 (• • • ) = 1 in (|9.26p . we have 
nw(E') > j/N > n\(t). Define the inverse function of n\y as ra^(l) = A+, (0) = A_ and 
n^(n\y(x)) = x for < nyv{x) < 1. Then 

t + 2F{t) >E>E'> n^{n x {t)). (9.27) 

Inserting this inequality into f)9.26|) and performing the dE' integration, we can see that 

^ E \a j -i J \ 2 -^<^Y,[ 1 ( n ^ t + 2F ^^i >nX ^)\ t - n w 

<C j {\n w {t + 2F(t)) - n x (t)\ + A^" 1 ) • \t - n^(n x (t)) + 2F(t)\dt 
<C(A 1 + A 2 +A 3 +A 4 ), (9.28) 
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where we expanded (|9.28p into four terms: 

M = J \n w (t + 2F(t)) -n w (t)\F(t)dt 
A 2 = f(\n w (t) ~n x (t)\ + N- 1 )F(t)dt 



A 3 = J \n w {t + 2F{t))-n w {t)\-\t-n w i {n x {t))\dt 
A A = j (\n w {t) - n x (t)\ + N- 1 ) ■ \t - n w \n x (t))\dt. 

Since n' w {t) = p w (t) < C, F{t) = N' 1 / 2 '^ when A_ + 2N~ £2 < t < A+ - 2N~ £2 and F{t) < 
N -i/2+s for any E ^ we btain A x < iV _1 ~ e , for some e > 0. Next, from $TE\i and F(t) < N' 1 / 2 ^ 
for any t, we can see A 2 < (iV~ 6/7 + N^N- 1 / 2 ^ < jy- 1 ^. 
To prove ^3 < iV~ 1_e , we start with writing A3 as 



^3 = 

where we set E\ = iV -1 / 4 and 



/ 



+ + + + , 

X + <t Jt<\- JA_<t<A_+Si J X + -E 1 <t<\ + J X_+Ei<t<X + -E 1 



E(t)di, (9.29) 



E(t) = \n w (t + 2F(t)) - n w (t)\ ■ \t - n^{n x {t))\. 

The first term on the r.h.s. of (|9.29p is equal to zero, since nw is constant outside [A_,A + ]. 
The second term can be bounded by iV -1-e , for some e > 0, using the facts F(t) < N- l / 2+s and 
n w {\- + E) < CE 3 / 2 , i.e., 



t<A_ 



f X - 

{t)dt<C \F{t)\ 3/2 dt < N^~ £ . (9.30) 

J\_-N-V 2 + s 



Now we prove that the third and fourth term of ()9.29|) are less N 1 £ , for some e > 0. 
From the explicit definition of nw, an easy calculation shows that, for all t € (A_, A+), 

\t~n^(s)\ < C\n w (t) - s\ 2 ' 3 (9.31) 

which in particular implies that 

max f max It - nZ}(s)\ ) < CN~ 2/1 . (9.32) 

te(X-,X+) \\s-n w (t)\<CN-W / 

Combining this with the fact |n w (t + 2F(t)) - n w (t)\ < CWn^W^F^) < CN~ l / 2+s , we obtain 
that the third and fourth terms of ([939J) are less than CN' 2 / 7 N~ l / 2+s N' 1 / 4 < N~^ £ , for some 
e > 0. 
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To bound the last term of (|9.29p . we use, once again the bound \nw{t + 2F(t)) — nw{t)\ < 
CN" 1 ^ 5 . From (|93X]) , we find therefore that 



\_+E 1 <t<\ + -E 



~(t)dt < CiV- 1/2+<5 J \n w {t) - n x (t)\ 2/3 dt 

< CiV~ 1/2+<5 (J \n w {t) -n x (t)\d?j ' <CN~ 



l-s 



At last, we prove < N £ . We rewrite as 



M = I £(t)dt+ / E(t)dt, 



(9.33) 



where S(t) = (\n w (t) - n x (t)\ + N- 1 ) ■ \t - n^(n x (t))\. When t £ (A_,A+), from O and (13^21 . 
one can see that 



It -n^(n A (t))| < max It - n™(s)| < Clt - A J It - A+| + CW~ 2/7 . 

|s-n w (i)|<C7V- 3 / 7 



(9.34) 



So we have 



/ E(t)dt = C [ dt( n x (t) -n w (t) + N~ l ) (|t - A_||t - A+| + A^ 2/7 ) . (9.35) 

Jt<t(X-,x+) Jti(X-.x+) v ' v y 



'^(A_,A+) J^(A_,A+) 

Using Lemma 19.21 we have 

r>A_ 



(19351) < C / dt 

JX--N- 1 / 5 

+ / di 

Jx+ 



n x {t) - n w (t) 
n x (t) - n w {t) 



+ N~ l ] ( It - A_||t - A+l + N~ 2/r 



(9.36) 



+ iV" 1 ) ( It - A_||t - A+l + A^ 2/7 ) + N 



-10 



Here we also used the fact that for large t, \n x (t) — 1| decays exponentially fast to zero (see, for 
example, Lemma 7.3 of [T7], which is stated for matrices with complex entries, but can be trivially 
extended to the case of real entries). Together with (|9.8p . we obtain that 



(EQ5D < C(N~ 6 / 7 + N- 1 ) (iV" 1 / 5 + AT" 2 / 7 ) + N~ 10 < N- 1 - 



(9.37) 



for some e > 0. 

When i £ (A_, A+), with ([932]) and (USD, we can see 



/ E(t)dt < CA^ 6/7 A^ 2/7 . 

Jte(X-,x+) 



(9.38) 
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Combining ([935]) and (|?T5gJ) . we obtain A* < iV" 1 ^ for some e > 0. Together with ff9T28]) . this 
compeletes the proof of Lemma 19.51 rj 

Next, we show that the assumptions (|9.14p and (19.150 in Lemma 19.51 always hold. First we 
prove (|9.14p in the next Lemma 19.61 with an analogous proof as Lemma 19.51 Then in Lemma 19.71 
we show that (|9.15j) holds when (|9.14p holds. 

Lemma 9.6 There exist small positive numbers e\ and Ei, such that, 

A_ + N- 2e2 < Xj_ < A_ + N- £2 , A+ - N~ 2e2 > X j+ > A+ - N~ 62 (9.39) 

hold with an extremely high probability, where we recall the notations j_ = N 1 ^ 61 and j + = N — 
N 1 - 61 . 

Proof. As in (|9.19p . for any E, 5 > and sufficiently large N, we have 

CM' 100 + n x (E + N~^ 2+s ) > n a (E) > n x (E - N- l ' 2+s ) - CN~ im . (9.40) 

So without any other assumptions, one can obtain (|9.28p . if we set F{E) = N^ 1 / 2 ^ instead of 
F(E) defined in the proof of Lemma 19.51 With a similar argument as in the proof of Lemma 19.51 
but with this redefined F(E), we have 

lj2hi-li\ 2 <CN-^. (9.41) 

i 

Then we claim that (|9.41j) implies that 

sup | a,- - 7j| < N~T°. (9.42) 
3 

We prove this claim by contradiction; assume that for some jo we have \a.j Q — 7j | > N~w. By 
symmetry we can assume that jo < N/2, the case jo > N/2 is analogous. We start with the case 
jo < N l l 2 . Then jj < A_ + CiV -1 / 4 and in this case aj must be larger than jj , otherwise 
"jo < 7jo - N'^o < A_ - ^N~to would contradict to a jo £ [A_ - CiV^ 1/5 , A+ + CN~ 1 /% see <^M). 
Using 

l7<-7il < CN~ 2 / 3 \i- j| (9.43) 
for any i, j and that ctj is monotone, we obtain that 

aj - 7i > «io " 7,o " CN^ e > ±N-i» 

for any j such that jo < j < jo + N 1 ^ 2 . Then 

jo+N 1 /* 

^2 \ a i ~ 7 ^'| 2 - cN ^^° ( 9 - 44 ) 

3= jo 
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with some positive c > which would contradict to (|9,4ip . Now we consider the case jo > iV 1 / 2 . 
The previous argument remains unchanged if a JO > jj . If ay < jj , then we use 

a, - 7i < a j0 - 7j0 + CN~ l ' G <~N~h 

for any j such that jo — N 1 / 2 < j < jo and we obtain 

30 

l«j "7j| 2 > ciV^i, 

which again contradicts to (|9.4ip . This completes the proof of (|9.42p . 

On the other hand, the estimate (|9.13|) . with K = 1, implies max.,- \\j -aj\< iV- 1 /2+<5 holds 
with an extremely high probability. Combining (|9.42p with this fact, we can see that for any small 
enough e±, there exists £2 such that (|9.39p holds, which completes the proof of Lemma 19.61 

□ 

The next Lemma guarantees the assumption ()9.15p in Lemma |9,5[ given ()9.14p . 

Lemma 9.7 If there exist sufficiently small positive numbers E\ and £2, such that 

\_ + at-2« < x ._ < a_ + N~ E2 , A+ - N~ 2£2 > X j+ >X+- N~ £2 , (9.45) 

holds with an extremely high probability, then there exists £3 > such that, 

\Xj - aj \ < N~^- £s , for j_ < j < j + , (9.46) 

holds with an extremely high probability, where we recall the notations j_ = N 1 ^ 61 and j + = 
N — jV 1_El . 

Proof. For simplicity, we only prove the case of j < N/2, the case j > N/2 is analogous. Using 
f9TLl]l . for any N/2 > j > j_, 5 > 0, with K = TV 1 / 4 , we have 

P(\\j,K ~ E(Aj,a-)| > A^ 5 / 8+<5 ) < Ce"^. (9.47) 
Now we claim that, for K = TV 1 / 4 , j_ < j < j+, 

\Xj,K ~ Xj\ < N~ 5/8 (9.48) 
holds with an extremely high probability, which implies 

|EAj,k - EXj\ < CiV" 5 / 8 . (9.49) 
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To see (|9.48p . first notice that 

P(|Aj-,jr - \j\ > iV~ 5/8 ) < P(A i+x - Xj > N^ 8 ). (9.50) 

Suppose now that Xj + K~ Xj > iV~ 5 / 8 . With the assumption (|9.45|) . we have that, for j_ < j < N/2, 

Xj E (A_ + N~ 2e \ A+ - N~ 2£2 ) . (9.51) 

with an extremely high probability. Divide this interval into small intervals with the length ^iV -5 / 8 . 
By the local Marchenko-Pastur law, i.e., Corollarv l8.2l the event that the number of the eigenvalues 
in each piece is larger than CiV 1_3e2_5//8 holds with an extremely high probability. On the other 
hand, if Xj+K — Xj > iV~ 5 / 8 , then the total number of eigenvalues in at least one of these intervals 
is less than K = iV 1 / 4 , which implies that A^+x — A,- < iV~ 5 / 8 holds with an extremely high 
probability. Together with (p3U|) . we have (iOHl) . Then combining (iOHl) . (plE9j) and (f9~¥7|) . we 
obtain (|9.46p and complete the proof. rj 



Now we are ready to prove Theorem 19.11 

Proof of Theorem \9.1\ Note that the assumptions in Lemma 19.51 are proved in Lemma 19.61 and 
19.71 Combining Lemma 1931 19.61 and 19.71 we obtain (|9.16j) . i.e., 

j 

for some constant e > 0, where ay is defined as EAj. Then we claim that for some constant e > 0, 

^{Xj - a,? < N- 1 - (9.53) 
j 

holds with an extremely high probability. To see (|9.53p , first notice that ()9.13j) , with K = 1 , implies 
that, for any 5 and j 

\Xj - atj\ < N~ l l 2+& (9.54) 

holds with an extremely high probability. The estimate (|9.46p shows that there exist e\ > and 
£3 > such that 

\Xj - aj\ < iV-^ £3 , for N 61 < j < N - N 61 (9.55) 

holds with an extremely high probability. Combining this with (]9.54p for the remaining indices 
j < N 61 or j > N - N 61 , we obtain (p33|) . Together with (f9T52|) . we have: 

lE^IA.-^f^iV- 1 -, (9.56) 

j 

1/2 

for some e > 0. Using the definition Xj = X- , one has 

\xj-l) ,2 \ = |Aj -lj\(xj +7 1 / 2 )- 1 < C\Xj - 7j |. (9.57) 
Inserting (|9.57p into (|9.56p , we obtain (|9.2p and complete the proof of Theorem 19.11 □ 
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A Existence and restriction of the dynamics 



As in Section [21 we consider the Euclidean space W N with the normalized measure \x = exp(— NH) / 'Z . 
The Hamiltonian % is of the form (|2.6p or (|2.8p . for definiteness we discuss the first case, the sec- 
ond case is fully analogous. % is symmetric with respect to the permutation of the variables 
x = (x\, . . . , xjv), thus the measure can be restricted to the subset Sat C R N defined in (|2.4p . In 
this appendix we outline how to define the dynamics (|2.1|) with its generator, formally given by 
L = — i(VH)V, on Eat- The condition f3 > 1 and the specific factors Y\i<j \ x j ~ x i\^ wm 
play a key role in the argument, in particular, we will see that /3 = 1 is the critical threshold for 
this method to work. 

We first recall the standard definition of the dynamics on M. N . The quadratic form 



is a closable Markovian symmetric form on L 2 (R N , d/z) with a domain Cq°(R n ) (see Example 1.2.1 
and Theorem 3.1.3 of |24j). This form can be closed with a form domain H l (R N , d/i) defined as 
the closure of in the norm || • || + = £(•, •) + || • Hi- The closure is called the Dirichlet form. It 
generates a strongly continuous Markovian semigroup Tt, t > 0, on L 2 (Theorem 1.4.1 [23]) and it 
can be extended to a contraction semigroup to L 1 (lR Ar , dfi), ||T t /||i < (I/Hi (Section 1.5 [23]). The 
generator L of the semigroup, is defined via the Friedrichs extension (Theorem 1.3.1 [23]) an d it is 
a positive self-adjoint operator on its natural domain D{L) with being the core. The generator 
is given by L = — ^(VH)V on its domain (Corollary 1.3.1 [23])- By the spectral theorem, Tt 
maps I? into D(L), thus with the notation ft = Ttf for some / G L 2 , it holds that 



Moreover, by approximating / by L 2 functions and using that Tt is contraction in L 1 (Section 1.5 
in [23]); the differential equation holds even if the initial condition / is only in L 1 . In this case the 
convergence ft — > /, as t — > + 0, holds only in L 1 . We remark that Tt is also a contraction on 
L°°, by duality. 

Now we restrict the dynamics to S = Sat. Repeating the general construction with M. N replaced 
by S^r, we obtain the corresponding generator and the semigroup T^ . 

To establish the relation between L and , we first define the symmetrized version of T, 



Denote X := C^°(S). The key information is that X is dense in H 1 (fil N , d[i) which is equivalent to 
the density of X in C^°(M> N , dfi). We will check this property below. Then the general argument 
above directly applies if is replaced by Stv and it shows that the generator L is the the same 
(with the same domain) if we start from X instead of (K N , dfj.) as a core. 




dtft = Lf t , 



t>0, and lim Jf t - /|| 2 = 0. 



£ := R w x : 3i / j with Xi = xj 
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Note that both L with are local operators and L is symmetric with respect to the permu- 
tation of the variables. For any function / defined on E, we define its symmetric extension onto 

E by /. Clearly Lf = for any / G Cg°(E). Since the generator is uniquely determined by 

its action on its core, and the generator uniquely determines the dynamics, we see that for any 
/ G L 1 (E,d//), one can determine T t / by computing Ttf and restricting it to E. In other words, 
the dynamics ()2. 1|) is well defined when restricted to E = S^r. 

Finally, we have to prove the density of X in C^(R N , d/x), i.e., to show that if / G Cg ^), 
then there exists a sequence / n G Cg°(E) such that £{f — f n , f — f n ) — > 0. The structure of E is 
complicated since in addition to the one codimensional coalescence hyperplanes xi = Xj (and Xi = 
in case of E + ), it contains higher order coalescence subspaces with higher codimensions. We will 
show the approximation argument in a neighborhood of a point x such that Xi = Xj but Xi ^ x^ 
for any other k 7^ The proof uses the fact that the measure d/x vanishes at least to first order, 
i.e., at least |, around x, thanks to (3 > 1. This is the critical case; the argument near 

higher order coalescence points is even easier, since they have lower codimension and the measure 
/i vanishes at even higher order. 

In a neighborhood of x we can change to local coordinates such that r := X{ — Xj remains 
the only relevant coordinate. Thus the task is equivalent to show that any g G C^°(M) can be 
approximated by a sequence g £ G Cq°(1R \ {0}) in the sense that 

/ \g'(r) - g' £ (r)\ 2 \r\dr (A.l) 

as e — > 0. It is sufficient to consider only the positive semi-axis, i.e., r > 0. Extending the functions 
to two dimensional radial functions, G(x) := G e (x) = g £ (\x\), this statement is equivalent 

to the fact that a point in two dimensions has zero capacity. 

B Bakry-Emery argument on a subdomain 

The estimate (|4.14p in Theorem 14.21 is based on the Bakry-Emery argument [2] for the dissipation 
of the Dirichlet form. This method uses a lower bound on the Hessian of T~L and an integration by 
parts. Since the dynamics is restricted to E = E^r, we need to check that the boundary term in 
the integration by parts vanishes. 

In our application, this argument will be used for the Hamiltonian % (see (|4.5p ) and its generator 
L = A — §(V%)V, but for simplicity, we omit the tilde from the notation below. With h = h t = 
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y/qt a standard calculation (see (5.8) of [16] with somewhat different notations) shows that 



1 

N 
1 
2N 



VhLVh - ^Vh(V 2 n)Vh + 2^(V/i)V[/i _1 (V/i) 2 ] 



-NH 



dx 



-NH 



dx 



assuming that the quantities in each step are well defined and that the boundary term 



dih d; Ah e 



NV. 







(B.l) 



(B.2) 



in the integration by parts in the third line vanishes. In [16] we argued with a somewhat specific 
form of q, an information not directly available here. 

The rigorous proof in the general case uses a regularization and a cutoff argument. First we 
regularize the function q = qt E D(L), t > 0, by defining 



<f (x) := 



g(x) + £ 
1 + e : 



h £ :-- 



for some e > 0. This has the advantage that the derivatives of h e can be bounded by those of q e . 
We consider a cutoff function 9 € C^°(S) to be specified later and we insert 9 in the calculation 
(jD.ip . Since L is an elliptic operator with smooth coefficients away from the boundary dT,, by 
standard parabolic regularity we know that q and thus h are smooth functions inside E. Thus each 
step in the cutoff version of (jB.ip is justified with an additional term coming from the derivative 
hitting 9 in the integration by parts. After repeating the steps in (|B.ip . we obtain 



i 

'2N 



J 9{Vh e fe- NU d-x = J 9Vh £ v(Lh £ + ^j^(Vh £ ) 2 y~ NH dx 

< — \- [ 9X7h £ (V 2 H)Vh e e- NH dx--N [ V^^)^^)^^^) e'^dx. 
2iV J s 2 J s ^—f 



(B.3) 



We now show that, by an appropriate choice of a sequence of cutoff functions, the second term 
in (|B.3|) vanishes. We first define the set of higher order coalescences where at least three point 
coincide as 

Q:={x£ dT, : 3i s.t. Xi = x i+ i = x i+2 }. 
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We remark that in case of Assumption I' we formally introduce xq = to this definition, so that Q 
will include also three point singularities of the type x\ = X2 = 0. For any 5 > we define the set 

Qs := {x e E : dist(x,Q) < 5} 

is the ^-neighborhood of the three-point singularity set within S. Introduce an additional small 
positive parameter r/ <C S. We now choose the cutoff function of the form 9 = 6162, depending 
on the parameters 5 and rj, such that 

(i) 0! (x) = 1 if dist(x,aS) > 2 V , 0! (x) = if dist(x,9S) < rj and |V0i| < 0(n~ l ); 

(ii) 6» 2 (x) ee 1 if dist(x, Q) > 25, 2 (x) = if dist(x, Q) < 5 and | V0 2 | < O^ 1 ). 

Here and in the sequel we make the convention that a quantity of order 5 k with some k € R 
(sometimes denoted by 0(<5 fc )) denotes a number that is comparable with S k with implicit constants 
that may depend on N. However, iV is fixed in this argument, so this dependence is irrelevant. 
Similar convention holds for 0(rj k ). 

We state two estimates on the solution qt of (14. 13ft that will be proven at the end of the section. 

Lemma B.l Assume that qo € L°°. Then the solution qt of (|4.13p satisfies a uniform supremum 
bound on the closure ofT,, 

supsupgj(x) < 00. (B.4) 

Furthermore, qt is regular away from the higher order coalescence singularities with the estimate 

sup{|V fc %(x)| : x G EHK, dist(x,Q) > j} < C(t,k, N, K)5~ k (B.5) 

where K is a compact set and the constant depends only on the indicated parameters. In particular, 
qt is regular up to the boundary <9£ \ Qs, i.e., at the two-point coalescence points away from higher 
order coalescences. 

Using this lemma, we can treat the second term on the r.h.s. of (|B.3p . We split the integration 
into two regimes. First we consider the regime where 02^0± 7^ 0, i.e., an (2r/)-neighborhood of 
<9S \ Qs- On this set we note the local density scales at as rf , thanks to the term \x{ — xA" in 
e~ N ^. Thus the measure of the support of V# near <9X \ Qs scales as r/ 1+ ^, while |V#| < Cn~ l 
(assuming rj < 5). Since (|B.5|) guarantees that the derivatives of h £ remain locally bounded (with 
a bound depending on e, 5, t and N), the boundary term near <9£ \ Qs vanishes as rj — > 0. 

To estimate the integral on the support of V02, i.e., on a subset of Q25, we use that can 
be replaced with 02 after taking the rj — > limit. Since we have |V#2| = 0(5~ 1 ) and |V fc /i|| < 
C e \V k ql\ < C £ j,N<)~ k with k = 1,2, the integrand scales at most 5~ A . Since the local density scales 
at least as 8 3 P due to a factor of the type |x, — — Xi + 2\^\xi — Xi + 2\^, the total measure of 

Q2S is of order 6 2+3 ^ . Hence the integral on Q2S scales at most as S 2+3 ^~ 4 < 5 in the 5 parameter 
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and therefore the contribution of the neighborhood of higher order singularities to the second term 
in (|B.3[) vanishes as 5 — > 0. 

After having removed 9 and the second term from (|B.3P , we let e — > and this gives the desired 
result (jlTHD . 

To complete the argument, finally, we need to prove Lemma IB. 11 

Proof of Lemma \B.li The bound (|F3.4[) follows immediately, since qo G L°° and the semigroup 
Tt a contraction in L°° (see Appendix lAl). 

The second statement of Lemma IB, II follows from a standard regularization argument for a 
typical two-point singularity at Xi = xi + \ that was already outlined in [20]. Fix a point x* G dY,\Q$ 
and assume that x* = x* +1 , but for all other pairs \x* —x* +1 \ > 5. We remark that the neighborhood 
of two (or more) independent singularities, e.g., Xi = Xj+i and xj = > 2, can be treated 

similarly by applying the same regularization argument separately. We omit these details here. 

Let B be a neighborhood of size 0{5) around x*. Choose a local coordinate system $(x) = 
(it,y) G M + x M. 1 ^" 1 in B such that u = h(xi+i — > 0. Within <&(£?), we can write 



L " AN 



d 2 u + ^ u 



+ L r 



eg ■ 



where L reg is an elliptic operator with second derivatives in the y variables and with coefficients 
regular on the scale 5 (since all other singularities are at least at a distance 0{5) away from §{B)). 

For the /3 = 1 case, by introducing a function qt(a,b,y) := qt(Va 2 + 6 2 , y) of N + 1 variables, 
we see that q\ satisfies dtqt = Lq t , where 

L = ^[d 2 a + dl]+L reg , 

i.e., L becomes an elliptic operator L with bounded and regular coefficients in the new variables. 
A similar transformation is possible for any integer j3 > 1, where u is considered as the radial part 
of a (/3 + l)-dimensional variable. 

We claim that the singular point u = becomes a removable singularity in the variables (a, b) 
around (0, 0). Note that the singular set is a codimension two subspace in the (a, 6, y, t) space-time 
coordinate system which becomes a line segment in the (a,b,t) space-time system if we disregard 
the variable y. Note that y plays no role in this argument since every coefficient is regular in y. 
The parabolic equation dtqt = L% holds in a strong sense away from the origin (a, b) = (0, 0) in 
these two variables, and moreover % is bounded by (|B.4|h We can thus apply Theorem II of [JJ 
with p = 2, r = oo to see that qt must coincide with the regular solution obtained by using the 
fundamental solution to the equation in a small space-time neighborhood of the singular set. This 
proves that qt, and hence qt, is a smooth function up to the boundary <9£ \ Qs- 

To obtain the quantitative estimate ()B.5[) . we consider the regularity of the coefficients of L reg . 
Due to the special structure of T~L, every term in L = 2^7 A — ^(V%)V is either regular on any small 
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scales, or it scales as (length) -2 . Since the neighborhood B is at least at distance 0{5) away from 
the other singularities, the coefficients of L reg are regular on scale 5. Therefore the solution qt is 
regular on scale 5 on B and this gives the 5-scaling of the estimate (|B.5|) • This completes the proof 
of Lemma IB. II rj 

Acknowledgement. The authors thank Alice Guionnet for pointing out some errors in the 
preliminary version of the manuscript. 
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